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Handle with care 


The possibility that H7N9 avian influenza may evolve sufficiently to cause a pandemic has scientists 
turning again to controversial research —they must be careful how they justify the risks taken. 


far infected at least 134 people, and killed 43 of them. Thank- 

fully, there are no signs yet that it can easily be transmitted 
between people — instead it is sporadically being caught by humans 
through contact with chickens and other fowl. 

Researchers now want to make genetically engineered versions of 
H7N9 that are more transmissible and pathogenic in mammals. Ina 
Correspondence published jointly this week in Nature and Science (see 
page 150), 22 scientists, including Ron Fouchier of the Erasmus Medi- 
cal Center in Rotterdam, the Netherlands, and Yoshihiro Kawaoka of 
the University of Wisconsin-Madison, argue that such research can 
help to assess the ‘pandemic potential’ of H7N9. The dilemma is that 
should such engineered strains be accidentally or deliberately released 
from a lab, they could spark a flu pandemic. 

The announcement is likely to prompt some replay of last year’s 
debate over the creation by Fouchier and Kawaoka of lab strains of 
H5N1 that could transmit between ferrets. And it offers the first test 
of some of the review and oversight structures put in place for this 
‘gain-of-function flu research. As this journal has said before, scien- 
tists who push for such research should be wary of over-selling the 
benefits to public health, at least in the short term, as a way to justify 
the risks taken. 

A sense of perspective is crucial here. The long-term benefits of such 
work are clear — as long as it is done to the highest biosafety standards. 
It will shed light on, for example, the mechanisms of virus transmissibil- 
ity and pathogenicity. But the immediate benefits to public health and 
our short-term ability to counter the threat of H7N9 are less clear-cut. 
Scientists cannot predict pandemics, so to assess the pandemic poten- 
tial of viruses — and to decide which strains warrant the manufacture 


Ts H7N9 avian flu virus first reported in China in March has so 


of trial vaccines — comes down to judgements of relative risk. 

Tests of how flu viruses behave in animal models such as ferrets 
can certainly provide information on the risk of transmissibility and 
pathogenicity, although it can be difficult to extrapolate those results 
to humans. A rash of papers this year has shown that H7N9 does have 
limited airborne transmissibility in ferrets, although the virus is not 
transmitting between people in the current outbreak in China. 

Another way to assess pandemic potential is to monitor wild-type 
viruses for mutations that allow the virus more ready access to human 
cells. H7N9 has already acquired some of these mutations, which is 
why it infects humans more easily than does H5N1. But as researchers 
pointed out in June, there is no scientific evidence that such mutations 
predict the risk ofa pandemic (D. M. Morens et al. N. Engl. J. Med. 368, 
2345-2348; 2013). Transmissibility is more complex than that. 

In creating mammalian- transmissible versions of H7N9, scientists 
would go a step further and hope to identify combinations of mutations 
that could increase virus transmissibility in ferrets or other models. 
Such work could yield information on the biological principles affect- 
ing transmission. But nature could well come up with combinations 
for transmission that are different from those obtained in experiments. 

Following the H5N1 controversy, the US Department of Health and 
Human Services has introduced an extra layer of review that will apply 
to anyone seeking funding for work to make mammalian-transmissi- 
ble strains of H7N9 (see page 151). The risks and benefits of the work 
will be assessed by a panel of experts in public health, security, risk 
assessment, law and ethics, and, importantly, any extra steps needed to 
mitigate biosafety risks will be considered. The way the review handles 
H7N9 will be an important test of the effectiveness and transparency 
of this new approach. = 


Blood ties 


Scientists should give donors more information 
about how their biospecimens are used. 


researchers can use her cells, six decades after her fatal cervical 
tumour spawned the HeLa cell line. There is little doubt that 
the controversy over the case contributed to the decision by the US 
National Institutes of Health to consult her relatives about the future 
use of her genome information (see pages 132 and 141). But people 
who donate samples to biomedical research today are unlikely to find 
out what happens to their material. 
Standards of informed consent have improved since scientists 


Tes family of Henrietta Lacks is finally getting a say in how 


established HeLa without approval from Lacks or her family. But 
research participants still have little control over how their tissues 
and data are used, and often never hear from the researchers again. 
Increasingly, volunteers are asked to give ‘broad consent’ for samples 
and data to be used in studies that may not have been conceived at the 
time of donation. In exchange, donors should have the option to learn 
how their specimens are being used — and even to withdraw consent. 
This already happens informally in some studies, but digital technolo- 
gies could allow researchers to keep patients updated. Imagine the thrill 
of giving a sample, logging on to a secure website years later and discov- 
ering that your specimen helped to develop a skin-cancer treatment. 
This continued contact with donors raises issues — not least how to 
ensure their anonymity. But researchers must also be honest and tell 
donors that privacy cannot be guaranteed, particularly for highly iden- 
tifiable genomic information. Some volunteers and their families are 
rightly proud that they are directly contributing to research. Funders 
and researchers should give more of them the chance to stay involved. = 
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genome (see page 207) and the announcement from the US 
National Institutes of Health on how it will control who gets to 
use the sequence information (see page 141) highlight a growing issue in 
modern science: access to biomedical and health-related research data. 

The amount of such data continues to grow at breakneck speed, gen- 
erated by large epidemiological and cohort studies that track people's 
health over many years (for example, the UK Biobank project) and by 
studies that sequence the DNA of many individuals, such as the 1000 
Genomes project. Researchers, funders and governments are becoming 
increasingly aware of the potential power of linking and co-analysing 
different data sets. Genomic data linked to large sets of patient records, 
for example, might reveal connections about disease that we would not 
otherwise discover, and data from the social sci- 
ences could add further value to these studies. 

Maximizing access to data resources should 
increase the chances that scientists will make dis- 
coveries with medical benefits. As a result, most 
major research funders require grant recipients 
to make any large data sets they create available 
to other researchers. It is an ethical imperative 
that we seek to maximize the value of research 
data generated from human participants, par- 
ticularly when using public funds. 

In response to open-access policies, a trend is 
emerging of allowing legitimate researchers access 
to research data before publication. In making 
unpublished data available, however, two sets of 
interests need to be safeguarded. Most research 
participants expect privacy protection and do not want their genomes 
or health records to be readily identifiable. Furthermore, researchers 
who spend time, effort and ingenuity to generate, process and manage 
large research data sets expect to get appropriate credit. This also relates 
to emerging discussions about clinical trials: there is a need for more 
access to patient-level data (as highlighted by the AllTrials campaign), 
while respecting the terms of study participants’ consent. 

To navigate these issues, many large genome and longitudinal studies 
have set up specific data-access procedures, often overseen by com- 
mittees. This is what the National Institutes of Health has done for 
the HeLa sequence. As the number of these data-access committees 
grows along with the links between data sets, a question arises: is such 
a piecemeal approach appropriate? The scientific and medical potential 
of data will only be realized if researchers are not stymied by myriad 
data-access mechanisms and by inconsistent ways of recording and 
describing data variables. So, does biomedical 


Te week's publication in Nature of a second HeLa cancer-cell 
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Balancing privacy with 
public benefit 


Maximizing access to research data will greatly benefit science, and users can 
help to establish universal principles on how to doit, says Martin Bobrow. 


identified. As more research based on linked data sets emerges, it will 
be extremely important to understand how these data are being used, 
to quantify the risks and to devise proportionate governance that allows 
innovative uses of data to flourish while protecting participant confi- 
dentiality as far as possible. 

I chair the Expert Advisory Group on Data Access — a working group 
that has been set up to provide strategic advice on this issue to funders 
—and we need your help. We have already talked to those who produce 
and manage biomedical and social-science data. Now we want to hear 
from those who use the data, or who would like to use them in future. 

What does the regulatory landscape look like for potential data users? 
Are we maximizing the value of the data, and if not, why not? How 
many data sets are out there, in what fields, governed by how many data- 
access committees, operating to what standards? 

The remit of the working group is for UK- 
based funders, but our scope is international 
and we want to reach across both disciplinary 
and national borders. Still, so far we have found 
it extremely difficult to get an overview of the 
situation in the United Kingdom, let alone inter- 
nationally. This is partly because of the prolifera- 
tion of data in increasingly large, complex and 
heterogeneous data sets, but also because of the 
patchwork of regulations, standards and policies 
that govern the management of research data 
across the world. 

To help fill in the gaps, we are conducting an 
online survey of users of research data, and we 
would value the input of Nature readers. If you 
use shared data in your research, wherever you are in the world, I urge 
you to participate (see go.nature.com/bmun1x). 

We are interested in, for example, how controlled access to data 
affects your projects and how easy you find it to locate the right data 
sets. We know that some types of data are held in central repositories, 
but others are held, managed and formatted within local or institu- 
tional data-management systems that are known only to a small group 
of collaborators. 

The challenge in data access is to achieve an appropriate balance. 
On the one hand, managers need to rigorously safeguard the interests 
of research participants and to apply serious sanctions against anyone 
who wilfully misuses their data. On the other hand, they also need 
to ensure that research data are accessible to legitimate researchers 
without undue costs and delay. All of this will be greatly helped if 
there is wide agreement on principles for structure, governance and 
use of shared data. = 


Martin Bobrow is an emeritus fellow at the University of Cambridge, 
UK. 
e-mail: mb238@cam.ac.uk 
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Flu makes 
bacteria go bad 


A normally harmless bacterial 
biofilm can take a dangerous 
turn when exposed to a virus. 

Streptococcus pneumoniae 
can trigger bacterial 
pneumonia, but also colonizes 
the nose or throat passages of 
up to 15% of healthy adults. 
To learn how it might become 
pathogenic, researchers led 
by Anders Hakansson at 
the State University of New 
York in Buffalo grew films 
of S. pneumoniae on a layer 
of human epithelial cells of 
the type that normally lines 
airways. 

Infecting the epithelial cells 
with influenza virus caused 
bacteria to disperse from the 
biofilms, as did mimicking 
flu-induced conditions such as 
fever. Dispersed S. pneumoniae 
invaded normally uncolonized 
sites including the lung and 
middle ear in mouse studies, 
where they caused oedema and 
inflammation. They also hada 
more virulent gene-expression 
pattern than bacteria growing 
in biofilms or in standard 
laboratory conditions. 
‘Interkingdom signalling’ 
could be key to inducing 
disease, the authors say. 
mBio 4, e€00438-13 (2013) 


Heavy-metal stars 
make lead clouds 


Two helium-rich stars contain 
more of the element lead than 
astronomers have ever seen. 
The stars may represent an 
intermediate stage of stellar 
evolution in which heavy 
metals can become enriched 
and form cloud-like layers. 

A team led by Naslim 
Neelamkodan and 
Simon Jeffery of Armagh 
Observatory near Belfast, 


Selections from the 
scientific literature 


Travelling zebras forecast the weather 


Zebras in Botswana heed subtle weather and 
vegetation clues when choosing when and how 


to move to greener pastures. 


Hattie Bartlam-Brooks of the University 
of Bristol, UK, and her colleagues fitted adult 
zebras (pictured) with tracking collars and 
monitored them daily on their annual migration 
from the Okavango Delta to the Makgadikgadi 
grasslands, around 250 kilometres away. They 
compared the migrations of seven mares with 
models informed by satellite data on regional 


UK, analysed light from nine 
subdwarf stars and discovered 
the two lead-rich stars about 
250 parsecs and 300 parsecs 
from Earth, respectively. The 
researchers think that the stars 
could each contain as much as 
100 billion tonnes of lead — up 
to 100 times the amount found 
in normal subdwarf stars. 
Starlight exerts radiative 
pressure, a slight force that can 
nudge particles. This may drive 
the heavy metal to separate 
into a thin atmospheric layer, 
the team suggests. 
Mon. Not. R. Astron. Soc. 
http://dx.doi.org/10.1093/ 
mnras/stt1091 (2013) 
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vegetation and rainfall. The animals seemed to 
use local cues to anticipate the food and water 


available at their destination and adjust their 


authors note. 


Molecular 
switches in RNA 


An RNA-protein complex 
regulates gene expression in an 
unanticipated way. 

The cellular machines 
called spliceosomes 
reconfigure transcribed 
RNA into its mature, 
protein-coding form. The 
‘minor spliceosome’ is less 
than 1% as abundant as 
the major spliceosome, but 
exists in plants, fungi and 
animals, with precursors 
to hundreds of human 
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movements accordingly — for example, by 
delaying departure or reversing direction when 
rainfall was unseasonably late. 

Zebras forecasting skills might help them to 
adjust to environmental and climate change, the 


J. Geophys. Res. Biogeo. http://dx.doi. 
org/10.1002/jgrg.20096 (2013) 


messenger RNAs containing 
a section removed only by 
the minor spliceosome. 
Researchers led by Gideon 
Dreyfuss at the University 
of Pennsylvania School of 
Medicine in Philadelphia 
showed that a component 

of the minor spliceosome, 
an RNA-protein complex 
called U6atac, is extremely 
unstable and normally 
limits the machine's activity. 
When cells are stressed, 
however, signalling enzymes 
stabilize U6atac, boosting 
its levels and increasing 

the production of mature 
RNAs. This allows the minor 
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spliceosome to perform as a 
quick-acting valve to switch 
on the production of certain 
proteins. 

eLife 2, e€00780 (2013) 


SOCIAL BIOLOGY 


Insects show savvy 
mob mentality 


Among social insects, 
individuals excel at easy 
decisions and colonies do 
better at subtle distinctions — 
but big swarms may snag the 
best options. 

When house hunting, 
Temnothorax ants (pictured) 
recruit each other to 
potential new homes. Takao 
Sasaki at Arizona State 
University in Tempe and his 
colleagues forced isolated 
ants and whole colonies to 
choose between two nest 
sites. When one site was 
only slightly better than 
another, the colonies did 
better than individuals at 
picking the best site. When 
site differences were larger, 
the ants performed better as 
individuals. 

But big colonies are 
better when there are more 
choices to be made. Timothy 
Schaerf and colleagues at 
the University of Sydney, 
Australia, used mathematical 
models and field experiments 
to compare how honeybees 
(Apis mellifera) picked out 
anew nesting site. Swarms 
of 15,000 bees chose ideal 
locations more often than 
those with 5,000 bees. Huge 
swarms contain more scouts, 
and so can collect and 
compare more options for 
potential digs. 

J. R. Soc. Interface 10, 
20130533 (2013); Proc. 
Natl Acad. Sci. USA http:// 
dx.doi.org/10.1073/ 
pnas.1304917110 (2013) 


EVOLUTION 


Mammals and 
monogamy 


Some mammals may have 
turned to pair-living because 
of infanticide or isolated 
females. 

Using an evolutionary 
tree of 230 primates as a 
framework, Christopher Opie 
of University College London 
and his colleagues ran 
simulations of evolutionary 
history to investigate what 
conditions might produce 
the behaviours of modern 
primates. They conclude that 
monogamy arose after males 
began guarding females to 
stop rivals from killing their 
offspring. 

Tim Clutton-Brock and 
Dieter Lukas at the University 
of Cambridge, UK, used a 
similar method to study how 
monogamy came about in 
mammals generally. Using 
an evolutionary tree of more 
than 2,000 species, they found 
that monogamy tended to 
arise when females lived alone 
and were widely dispersed. 
Pair-living probably arose 
because males could not 
cover a large enough area to 
monopolize more than one 
female. 

Proc. Natl Acad. Sci. USA 
http://dx.doi.org/10.1073/ 
pnas.1307903110 (2013); 
Science 341, 526-530 (2013) 
For a longer story on this research, 
see go.nature.com/glatpz 


MOLECULAR PSYCHIATRY 


A factor for autism 
and schizophrenia 


Deficits in a protein that binds 
RNA may be a common risk 
factor for disorders including 
schizophrenia, autism and 
cognitive impairment. 

Nelson Freimer at the 
University of California, 
Los Angeles, Utz Fischer of 
the University of Wiirzburg, 
Germany, and their colleagues 
studied a population in 
northern Finland in which 
such disorders are particularly 
frequent. They discovered 
that many people in this 
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Ships acidify oceans 


> HIGHLY READ 


on wiley.com 


Pollution from ships can make the 
waters of heavily trafficked trade routes 


more acidic, and may contribute to 
local acidification on a scale similar to that resulting from 
increased atmospheric carbon dioxide. 

David Turner at the University of Gothenburg in 
Sweden and his colleagues modelled the effects of shipping 
emissions in the world’s waters using grids of 1 degree 
longitude and latitude. This fine detail suggested that ships’ 
emissions of sulphur oxide and nitrogen oxide acidify the 
water in some busy Northern Hemisphere coastal areas 
by up to 0.002 pH units each summer. Regulations that 
allow ships to reduce emissions to the air by ‘scrubbing’ 
fuel exhaust with sea water may accelerate acidification by 
transferring acid to surface waters. 

Although nota significant driver of ocean acidification 
globally, shipping acidification could be a concern where high 
traffic occurs near fisheries or important regions of marine 


biodiversity, the authors say. 


Geophys. Res. Lett. 40, 2731-2736 (2013) 


region are missing a small 

part of achromosome that 
includes the gene for the RNA- 
binding protein TOP36. The 
mutation increases the risk 

of schizophrenia and several 
other neurodevelopmental 
disorders. 

TOP36 interacts with 
FMRP, a protein associated 
with Fragile X syndrome and 
autism. In a separate paper, Sige 
Zou and Weidong Wang of the 
National Institute on Aging 
in Baltimore, Maryland, and 
their colleagues characterized 
these interactions and show 
that mutations in TOP36 or 
in FMRP can cause abnormal 
development of synapses in 
flies and mice. 

Nature Neurosci. http://dx.doi. 
org/10.1038/nn.3484; http:// 
dx.doi.org/10.1038/nn.3479 
(2013) 


Cheating orchids 
turn over new leaf 


Many orchids lure pollinators 
with unfulfilled promises 

of nectar, but some former 
deceivers have evolved the 


ability to produce the 
sugary reward. 
Up to 40% of 
orchid species 
are thought 
to be floral 
fibbers, including 
many members of the diverse 
Disa genus of African orchids 
(Disa uniflora pictured). 
Steven Johnson and his 
colleagues at the University 
of KwaZulu-Natal in South 
Africa analysed 111 Disa 
species, characterizing each 
for the presence or absence of 
nectar. 
By mapping these data on 
to an evolutionary tree of 
the genus, the researchers 
showed that nectar 
production evolved from 
deceitful ancestors nine 
times and was lost once. The 
authors speculate that such 
transitions may be guided 
by ecological circumstances 
that favour the fitness of one 
system over the other. 
Biol. Lett. 9, 20130500 (2013) 
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Legalizing pot 


Uruguay has moved closer to 
becoming the first country 
to legalize and regulate 
marijuana at a national level, 
after the country’s House of 
Representatives approved a 
controversial bill on 31 July. 
Backed by President José 
Mujica and his Broad Front 
coalition, the measure passed 
by 50 votes to 46. The bill 

is also expected to pass the 
Senate. The United Nations’ 
International Narcotics 
Control Board criticized 

the move, saying that such a 
law, if enacted, would violate 
international drug-control 
treaties to which Uruguay is 


a party. 


Redefine cancer 

The word cancer’ should be 
used to define only tumours 
or lesions that are likely to 
become lethal if left untreated, 
recommends a working group 
of the US National Cancer 
Institute. In an editorial 
published in the Journal 

of the American Medical 
Association on 29 July, the 
authors argue that the use of 
the term to describe non-life- 
threatening conditions can 
lead to needless treatment 

(L. J. Esserman et al. J. Am. 
Med. Assoc. http://doi.org/nb9; 
2013). The group suggests 
that physicians should reduce 
screening frequency and focus 
on high-risk patients. It also 
calls for the development 

of better methods to tell 
aggressive conditions from 
non-threatening ones. 


UC open access 

The University of California 
(UC) faculty has adopted 

an open-access policy for 
research articles authored by 
its members. The policy was 
adopted on 24 July and publicly 
announced on 2 August, and 

it will be phased in over the 


Double coffin found at Grey Friars 


Archaeologists at the University of Leicester, 
UK, last week announced a second major find 
at the site of Grey Friars Church — a limestone 
coffin that, when opened, revealed an inner 
coffin of lead (pictured). A pair of feet visible 
through a hole in the otherwise mostly intact 
lead casket probably belongs to one of three 
prestigious figures known to be buried at the 


next year. The ten-campus UC 
system is the latest of more 
than 175 universities to make 
research freely available. But, 
as with other institutions, 
researchers can choose to opt 
out of the policy — a provision 
that critics say renders it 
toothless. See go.nature.com/ 
xydons for more. 


Animal restrictions 
The Italian parliament on 
31 July agreed on extreme 
restrictions for animal 
research. The restrictions, 
which include banning the 
use of animals in addiction 
studies, were added as 
amendments during the 
implementation of a 2010 
European Union (EU) 
directive that is already 
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considered to impose some 

of the strictest regulations on 
animal research in the world. 
Critics note that individual EU 
member states cannot legally 
add further amendments to 
the directive, and the law has 
yet to be finalized by the Italian 
government. See go.nature. 
com/nmtb4x for more. 


MIT Swartz report 


A report commissioned by 
the Massachusetts Institute 
of Technology (MIT) in 
Cambridge highlights the 
university's failure to take 
leadership in the case of 
Aaron Swartz, a programmer 
and Internet activist who 
committed suicide in January 
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site, the researchers say. The team unearthed 
the 2-metre-long stone coffin last September, 
but was not able to open it until last month. 
The lead coffin will be opened after researchers 
can determine the approach least likely to 
damage its contents. In September 2012, the 
team discovered the remains of English king 
Richard III at the site. 


(see go.nature.com/5wmeld). 
Swartz, 26, was awaiting trial 
for illegally downloading 
millions of articles from the 
JSTOR archive through the 
university’s network. The 
review, released on 30 July and 
led by MIT computer scientist 
Hal Abelson, noted that the 
university took a ‘hands-off’ 
approach to the case and 

did not wield its expertise 

and reputation to shape 
information-policy reform. 


Fraud case settled 


Northwestern University 

in Evanston, Illinois, will 
pay the US government 
US$2.93 million to settle 
claims that cancer researcher 
Charles Bennett misused 
federal research grants from 


UNIV. LEICESTER 


SOURCE: M. JOHANSSON 


w the National Institutes of 

© Health to pay for family travel 
* and to employ unqualified 
relatives as consultants 
between 2003 and 2010. 
Northwestern did not admit 
liability in the settlement, 
announced on 30 July. 
Bennett, who now directs the 
Center for Medication Safety 
and Efficacy at the South 
Carolina College of Pharmacy 
in Columbia, has denied the 
allegations through his lawyer. 


Beefed-up burger 
Researchers have served up 
the first order of lab-grown 
beef burgers, pan-fried for 
a5 August press event in 
London. Mark Post, a tissue 
engineer at Maastricht 
University in the Netherlands, 
and his colleagues created. 
the synthetic patties from 
cow muscle cells, which form 
strands when grown in the 
lab. About 20,000 strands 
make up a fatless patty, which 
costs more than €250,000 
(US$332,000) to make. The 
event's two tasters, a writer 
and a researcher, said that 

the burgers were beefy but 
not juicy. 


Science writer dies 


Science writer David Dickson 
has died, it was reported 

on 2 August. Born in 1947, 
Dickson worked as Nature’s 
news correspondent in 


TREND WATCH 


An analysis has revealed 
the importance ofa large 


programme to find arthropod- 


borne viruses (arboviruses), 


sponsored by the Rockefeller 
Foundation in New York. Other 
pathogenic viruses have been 
discovered at a constant rate of 
about 2 per year for 60 years; 
but the 1951-66 Rockefeller 
programme coincided with a 
surge in arbovirus discovery, 
say researchers (R. Rosenberg 
et al. Proc. Natl Acad. Sci. USA, 
http://dx.doi.org/10.1073/ 
pnas.1307243110; 2013). 


Washington DC from 1978 

to 1982. He also worked for 
Science and New Scientist, 

and later returned to Nature 
as news editor (pictured 

in the Nature office in the 
mid-1990s). In 2001, Dickson 
founded SciDev.Net, a science 
news service for the developing 
world, and served as the 
organizations director and 
editor. He retired in 2012. The 
Association of British Science 
Writers honoured Dickson 
last year with a lifetime 
achievement award. Philip 
Campbell, Nature’s editor-in- 
chief, paid tribute to David's 
“powerful combination of deep 
knowledge, uncompromising 
standards and relentless 
advocacy”. 


NSF nominee 

On 31 July, astrophysicist 
France Cordova was 
announced as US President 
Barack Obama's choice to 


lead the National Science 
Foundation (NSF). If 


confirmed by the Senate, she 
would replace Subra Suresh, 
who left the US$7-billion 
agency in March. Cordova, 

a former chief scientist for 
NASA, was president of 
Purdue University in West 
Lafayette, Indiana, from 2007 
to 2012. See go.nature.com/ 
ufpogx for more. 


NOAA head 


Oceanographer and former 
astronaut Kathryn Sullivan 
was nominated on 1 August 
to lead the US National 
Oceanic and Atmospheric 
Administration (NOAA). 
Sullivan, who in 1984 became 
the first US woman to walk 
in space, has been the acting 
administrator at the 
US$5-billion agency since 
February. She now awaits 
Senate confirmation. See 
go.nature.com/e371zk 

for more. 


| BUSINESS 
Antibiotics deal 


Cubist Pharmaceuticals 
announced on 30 July 

that it would spend 

US$1.6 billion to expand 

its antibiotics pipeline by 
buying two companies, 
Trius Therapeutics, based in 
San Diego, California, and 
Optimer Pharmaceuticals 
of Jersey City, New Jersey. 
The deal will give Cubist, 
headquartered in Lexington, 
Massachusetts, a number of 


THE PACE OF HUMAN VIRUS DISCOVERY 


The discovery of arthropod-borne viruses has declined since a peak 


in the 1960s. 
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SEVEN DAYS | THIS WEEK | 


11-16 AUGUST 

The Gordon Research 
Conference on Biology 
of Aging takes place in 
Lucca, Italy, highlighting 
developments in 
knowledge about the 
loss of molecular, 
cellular and organismal 
stability during ageing. 
go.nature.com/nsv8c4 


14-16 AUGUST 
Computer scientists and 
information experts 
discuss advances in 
computer, mobile 

and network security 

at the 22nd USENIX 
Security Symposium in 
Washington DC. 
go.nature.com/Igaya8 


early-stage antibiotic drug 
candidates, as well as Dificid 
(fidaxomicin), Optimer’s 
marketed treatment for 
Clostridium difficile- 
associated diarrhoea, and 
tedizolid, Trius’s antibiotic 
against drug-resistant 
Staphylococcus aureus, which 
is in late-stage clinical trials. 


Grants cancelled 
The US National Science 
Foundation (NSF) has 
scrapped political-science 
funding opportunities for 
the remainder of 2013. The 
agency, which normally 
spends about US$10 million 
annually on such research, 
has not stated its reasons for 
the decision. But political 
scientists speculate that it 

is related to a congressional 
restriction, signed into law by 
US President Barack Obama 
on 26 March, that requires 
NSF-funded political-science 
research to benefit either 
national security or economic 
interests. See go.nature.com/ 
jmligd for more. 
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Scientists Historic deal Technique offers a “ The 
debate the kindest laboratory forged over famous HeLa cell way to gauge black hole ae ))) weird world of 
kill p.130 line p.132 spin p.135 \ : >) j metamaterials p.138 
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Chemical engineer Kemal Giiriiz was taken to court in Turkey as part of controversial terrorism trials. 


Scientists swept up 
in terrorism trials 


Turkish government ignores calls that trials are unfair. 


BY ALISON ABBOTT 


» 


ing after we left him,” says Hans-Peter 
Zenner of his visit to academic Fatih 
Hilmio§glu in the Silivri prison, some 80 kilo- 
metres from Istanbul. 
Zenner, a physician at the University of 
Tubingen in Germany, travelled to the penal 


cc lE was so emotional — we were cry- 


facility in February as part of a small delega- 
tion to investigate cases of Turkish academics 
charged with terrorism offences. The delegation 
had been commissioned on behalf of an inter- 
national human-rights network representing 
academies and scholarly societies including the 
US National Academies of Science and the Ger- 
man National Academy of Sciences Leopoldina. 
It concluded in its draft report on 1 August that 


the scientists had not received fair trials. 

On 5 August, the defendants — all former or 
current university rectors — received harsh sen- 
tences. Hilmioglu, a physician and former rector 
of Inénii University in Malatya, was sentenced 
to 23 years in prison on charges of conspiring 
to destabilize the government through politi- 
cal violence. Four other academics — includ- 
ing a transplant surgeon and chemical engineer 
— were given sentences of between 10 and 
15 years. Another, who had been in detention 
for more than 4 years, was released despite being 
sentenced to 12 years and 6 months in prison. 

The delegation had concluded in its report 
that “standards of justice failed’, and that in no 
case did the evidence brought by prosecutors 
“support the conclusion that any of our ... col- 
leagues is guilty of committing the crimes of 
which they have been accused” It called for an 
amnesty for all six academics, or for each to 
receive a new trial “that meets international 
fair trial standards”. 

“Tt’s terrible — though I hadn't been optimis- 
tic?’ says Carol Corillon, executive director of 
the International Human Rights Network of 
Academies and Scholarly Societies in Wash- 
ington DC and co-author of the report along 
with Zenner and Peter Diamond, a Nobel 
prizewinning economist at the Massachusetts 
Institute of Technology in Cambridge. “It is a 
miscarriage of justice — there was no evidence 
at all against any of them. We have not had 
time to think of our next step, but we never, 
ever drop a case.” 

“This international delegation reflects what 
many of us believe — that there were many 
irregularities,” says economic historian Sevket 
Pamuk, who is foreign secretary of the Bilim 
Academy in Istanbul, Turkey’s independent 
national academy of sciences. “Many have 
argued that evidence was fabricated” 

The scientists were sentenced as part of 
a trial code-named Ergenekon, in which 
275 people, mostly military personnel, were 
accused of participating in a purported ‘deep- 
state’ network that the government believed 
had intended to facilitate a military coup. 
Observers describe the trial as a stand-off 
between a secularist old guard, which held 
power until 2003, and the current mildly 
Islamic government of Recep Tayyp Erdogan. 
Erdogan has increased his majority in parlia- 
ment since first taking power, and is becom- 
ing more confident, says Pamuk. Critics 
suspect that the prominent academics in 
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> the Ergenekon trial are being punished 
for stances unrelated to terrorism: all six are 
staunch secularists and have defended sec- 
ularism by, for example, seeking to uphold a 
ban on headscarves in Turkey's universities. 

The delegation’s report also describes 
the plight of scientists, including one of the 
six just sentenced, accused in three further 
major political trials — known as Sledge- 
hammer, the KCK Operations and the 
Postmodern Coup. All the political trials 
had been assigned to special anti-terrorism 
courts, but these were abolished last year 
following criticisms that they ignored evi- 
dence. Many observers believe that the tri- 
als, which the report described as “highly 
irregular’, were used as an excuse to round 
up and silence government critics. 

The Sledgehammer trial, which involved 
365 people charged with attempting a mili- 
tary coup in 2003, ended last year. Industrial 
engineer Faruk Yarman, one of just two civil- 
ians to be charged, was sentenced to 13 years 
in prison. The report calls for his release. 

Political scientist Biisra Ersanl was 
arrested in 2011 as part of the KCK Opera- 
tions, and charged with membership of a 
violent Kurdish-rights organization. She 
was released from pre-trial detention in July 
2012; the report calls for a fair and expedi- 
tious trial for her. 

And Kemal Giiriiz, one of the six scien- 
tists sentenced this week, was arrested and 
detained in June last year as part of the Post- 
modern Coup trial (see Nature http://doi. 
org/h47; 2012). 

A chemical engineer who was head 
of Turkey’s Council of Higher Educa- 
tion from 1995 to 2003, Giiriiz had been 
a vociferous proponent of the headscarf 
ban. He attempted suicide in prison in 
June this year. This week, he was sentenced 
to 13 years and 11 months in prison in the 
Ergenekon trial; he is still awaiting trial 
under the Postmodern Coup. 

Pamuk, who holds joint positions at the 
Bosphorus University in Istanbul and the 
London School of Economics, says that 
many academics believe scientists such as 
Girtiz have been drawn into terrorism tri- 
als for reasons of revenge. “Many were uni- 
versity rectors,’ he says. “When they were 
powerful, they may have offended those 
who are now close to this government and 
are now ina position to retaliate.” Pamuk 
expects all the scientists sentenced in the 
Ergenekon trial to appeal, but says that they 
are unlikely to win. 

Guniz Giiriiz, a chemical engineer at the 
Middle East Technical University in Ankara 
and Kemal Giiriiz’s wife, told Nature that 
pressures on the family have been extreme. 
“We are going to appeal,” she said through 
tears after hearing his long sentence, add- 
ing, “Kemal has never had a proper expla- 
nation for why he was detained.” m 
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Much of the discussion about the humane killing of research animals centres on rodents. 


ANIMAL RESEARCH 


Best way to kill lab 
animals sought 


Researchers debate most humane methods of dispatch. 


BY DANIEL CRESSEY 


illing research animals is one of the 
kK most unpleasant tasks in science, and 
it is imperative to do it as humanely as 
possible. But researchers who study animal wel- 
fare and euthanasia are growing increasingly 
concerned that widely used techniques are not 
the least painful and least stressful available. 
This week, experts from across the world will 
gather in Newcastle upon Tyne, UK, to debate 
the evidence and try to reach a consensus. 
“There are lots of assumptions made about 
the humaneness of various techniques for 
euthanizing animals,” says Penny Hawk- 
ins, deputy head of the research animals 


department at the Royal Society for the Pre- 
vention of Cruelty to Animals, a charity based 
in Southwater, UK. “Sometimes an animal 
might not appear to be suffering, but might be 
conscious and suffering.” 

Much of the debate centres on rodents, 
which make up the vast majority of research 
animals. Current techniques for killing them 
include inhalation methods — such as cham- 
bers that fill with carbon dioxide or anaesthetic 
gases — and injecting barbiturates. Physical 
methods include cervical dislocation (break- 
ing of the neck), or decapitation with specialist 
rodent guillotines (see ‘Methods used to kill 
lab rats’). 

Experts hotly debate which method is 


PROS AND CONS 


Methods used to kill lab rats 


Some methods recommended by the 
American Veterinary Medical Association. 


Barbiturate injection: Fast-acting, but 
injection may cause pain. 


Inhaled anaesthetic (halothane, isoflurane, 
sevoflurane or desflurane): Useful when 
restraint of animal is difficult. 


Carbon dioxide: Acceptable, but chamber 
must be filled over several minutes and not 
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pre-filled. Death to be verified afterwards or 
ensured by physical method. 


Cervical dislocation: Causes rapid death, 
but skill must be learned. 


Decapitation: Useful when tissues must be 
free of euthanasia chemicals. 


Unacceptable: Nitrous oxide alone; nitrogen 
or argon asphyxiation (unless animals 
already anaesthetized); opioids. 


PASCAL GOETGHELUCK/SPL 


preferable. The most-discussed question at 
the meeting is likely to be about the use of CO,. 

“People do still worry about CO, and it’s still 
almost certainly the most widely used method 
[for killing rodents],” says Huw Golledge, who 
studies the anaesthesia of lab animals at New- 
castle University, UK. Golledge organized the 
meeting, which is backed by the London-based 
National Centre for the Replacement, Refine- 
ment and Reduction of Animals in Research. 
Its aim is to update a 2006 consensus docu- 
ment produced by international experts to give 
guidance to researchers working with animals. 

CO, is used to make rodents unconscious. 
They are then killed by either asphyxiating 
them with the gas, or by another method. But 
increasingly, studies suggest rodents find CO, 
stressful. 

Evidence for this comes mainly from ‘aver- 
sion studies. A key study by animal welfare 
researcher Daniel Weary’s group at the Uni- 
versity of British Columbia in Vancouver, 
Canada, shows that albino rats will move away 
from a dark compartment filling with CO, into 
a brightly lit box, despite disliking bright lights. 
The study found that they were less likely to 
move away from isoflurane, also used in eutha- 
nasia (D. Wong et al. Biol. Lett. http://doi.org/ 
ncv; 2012). 

Other evidence is contradictory (H. Valentine 
et al. J. Am. Assoc. Lab. Anim. Sci. 51, 50-57; 
2012), but Weary is firm in his beliefs. “Our own 
results indicate CO, is highly aversive,” he says. 

There are also question marks over physical 
methods. Performed perfectly with animals 
accustomed to being handled, cervical disloca- 
tion may be the best method, but it may not be 
practical for killing large numbers of rodents. 

The issues are even more uncertain for the 
new animal models that scientists are pursu- 
ing. For example, a huge increase in the use of 
zebrafish has put them on the meeting's agenda. 
Although much progress has been made with 
lab rodents, says Weary, “there's been much less 
work on fish welfare in general”. 

Widely used guidelines on animal eutha- 
nasia from the American Veterinary Medical 
Association (AVMA) in Schaumburg, Illinois, 
were updated earlier this year, in part to adapt 
to changes in the animals used in labs, with 
zebrafish guidance one of the additions. The 
association says it expects the lab animals sec- 
tion of these guidelines to continue to expand. 

Some of this guidance comes with regula- 
tory teeth. Later this year, the US National 
Institutes of Health, which funds biomedical 
research, says that it expects “full implementa- 
tion” of the AVMA guidelines, with previously 
approved projects reviewed using them. 

Regulation is also driving more unusual ani- 
mals onto the agenda. New legislation on the 
treatment of laboratory animals is currently 
being incorporated into the laws of European 
Union member states and will cover cephalo- 
pods, in some nations for the first time (see 
Nature http://doi.org/fk65pb; 2011). = 
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Squeezed light mutes 
quantum noise 


Silicon zip reduces energy fluctuations in laser beams to 
improve sensitivity of optical motion sensors. 


BY DEVIN POWELL 


skar Painter spends his time carving 
() silicon blocks into shapes that inter- 
act with light in strange ways. The 
latest of these — what Painter calls “Legos for 
adults” — squeezes the light from laser beams 
to push the limits of what they can measure. 

By eliminating some of the noise caused 
by quantum effects, researchers can use 
squeezed light to illuminate movements 
too small to see with normal light. Painter's 
silicon sculpture, built on a microchip, could 
boost the sensitivity of sensors that use lasers 
to monitor motion, such as the gyroscopes 
that keep track of an aircraft's orientation. 

“We have an opportunity to push the 
performance of these sensors by orders of 
magnitude,” says Painter, an applied physi- 
cist at the California Institute of Technol- 
ogy in Pasadena, whose team reports how it 
squeezes light on page 185 (ref. 1). 

All light is plagued by quantum noise, 
especially at the low powers typically 
required by sensors. These energy fluctua- 
tions blur the defined peaks of classical light 
waves, fundamentally limiting the precision 
of measurements. 

Squeezing the light can suppress some 
noise, but Heisenberg’s uncertainty prin- 
ciple demands a trade-off. A squeeze that 
reduces noise in one dimension — the height 
of a light wave'’s peaks, for instance — must 
be balanced by a stretch that adds noise in 
another, such as the distance between the 
peaks. Researchers therefore have to match 
the direction of the squeezing to the direction 
of the measurement. 

Efforts to put light-squeezing to use have 
so far focused on gravitational-wave detec- 
tors, which search for faint ripples in space- 
time by timing laser beams as they bounce 
between mirrors 4 kilometres apart. Passing 
ripples should stretch or compress the laser 
beams ever so slightly. But measurements 
with normal laser light are limited by quan- 
tum noise, and have so far failed to detect 
any disturbances attributable to gravitational 
waves. 

Hoping to improve the next generation of 
measurements, researchers at the Laser Inter- 
ferometer Gravitational-Wave Observatory 


(LIGO) in Hanford, Washington, added a 
dose of squeezed light by passing laser light 
through a crystal. In July, they reported that 
they had achieved a sensitivity better than the 
standard limit imposed by quantum noise’. 
This represents a step towards the ultimate 
goal of doubling LIGO’s sensitivity, says team 
member Nergis Mavalvala, a physicist at the 
Massachusetts Institute of Technology in 
Cambridge. “We have to work hard to strip 
the noise out of the light,” she says. 

Painter's silicon device potentially offers a 
simpler way to squeeze light, although only 
at frequencies too high to be useful for grav- 
itational-wave detectors. The device looks 
like a zip; photons 


“We have to bouncing around 
work hard between its two arms 
to strip the push them apart with 
noiseoutofthe a force dictated by 
light.” the amount of noise 


in the light. As the 
size of the gap changes, the zip tunes the fre- 
quency of the light — just as a finger sliding 
along a guitar string changes the pitch of the 
sound produced — and squeezes out some of 
the fluctuations. 
The prototype tends to leak light, so it can 
suppress only about 5% of the noise. “The 
absolute level of squeezing is relatively low,” 
says Warwick Bowen, a physicist at the Uni- 
versity of Queensland in Brisbane, Australia. 
Painter says he will next be working with 
higher-quality zips, which could cut out as 
much as 90% of the noise. 
But he will have some competition. A team 
at JILA in Boulder, Colorado, a joint institute 
of the University of Colorado and the US 
National Institute of Standards and Technol- 
ogy, has already created a vibrating silicon 
nitride membrane that boasts a 32% reduc- 
tion in noise. JILA physicist Cindy Regal and 
her colleagues will report their work in a paper 
under review at Physical Review X (ref. 3). “It 
has been technically challenging to get to this 
regime,” says Regal. m 
1. Safavi-Naeini, A. H. et a/. Nature 500, 185-189 
(2013). 

2. The LIGO Scientific Collaboration Nature Photon. 
7, 613-619 (2013). 

3. Purdy, T. P. Yu, P-L. Peterson, R. W., Kampel, N. S. 


& Regal, C. A. Preprint at http://arxiv.org/ 
abs/1306.1268 (2013). 
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Henrietta Lacks with her husband David. 


Deal done over 
HeLa cell line 


Family of Henrietta Lacks agrees to release of genomic data. 


BY EWEN CALLAWAY 


eborah Lacks wanted answers. In 1974, 
D she asked a leading medical geneticist to 
tell her about HeLa cells, a tissue-culture 
cell line derived from the cancer that had killed 
her mother Henrietta in 1951. The researcher, 
who was collecting blood from the Lacks fam- 
ily to map HeLa genes, autographed a medical 
textbook he had written and said that everything 
she needed to know lay within its dense pages. 
It would be more than 30 years before the 
family got a better explanation. 
Now the director of the US National 


Institutes of Health (NIH), Francis Collins, is 
trying to make up for decades of slights. Over 
the past four months, he has met Lacks family 
members to answer questions and to discuss 
what should be done with genome data from 
their matriarch’s cell line. 

“We wanted to get a better understanding 
of what information was going to be out there 
about Henrietta, and what information was 
going to be out there about us,’ says Henrietta’s 
grandson David Lacks Jr. (Deborah Lacks died 
in 2009.) On 7 August, Collins announced that 
the family has endorsed case-by-case release 
of the information, subject to approval by a 
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committee that will include family members 
(see page 141). 

The consensual approach is a sea change 
from the dismissive treatment of the past, says 
Rebecca Skloot, the journalist who recounted 
the scene between Deborah Lacks and the 
researcher in her 2010 book The Immortal 
Life of Henrietta Lacks. “It was the first time 
in the very long history of HeLa cells that any 
scientists have sat down and devoted complete 
attention to explaining to the family what was 
going on,’ she says (see “The Lacks legacy’). 

The agreement allows the publication ofa US 
government-funded HeLa genome sequence 
as well as the re-release of data that were 
pulled from public view soon after publica- 
tion in March because of the family’s concerns. 
Nature’s News team learned of the negotia- 
tions last month but agreed to delay coverage 
so as not to impede the talks. Brokered during 
meetings at Johns Hopkins School of Medicine 
in Baltimore, Maryland, the deal rekindles 
debates over consent and ownership of tissues, 
and data that arise from their study, at a time 
when the NIH is updating such rules. 

The HeLa cell line was established in 1951 
from a biopsy ofa cervical tumour taken from 
Henrietta Lacks, a working-class African- 
American woman living near Baltimore. The 
cells were taken without the knowledge or per- 
mission of her or her family, and they became 
the first human cells to grow well in a lab. They 
contributed to the development of a polio vac- 
cine, the discovery of human telomerase and 
countless other advances. A PubMed search 
for ‘HeLa turns up more than 75,000 papers. 
“My lab is growing HeLa cells today,’ Collins 
told Nature in an interview on the NIH campus 
in Bethesda, Maryland. “We're using them for 
all kinds of gene-expression experiments, as is 
almost every molecular-biology lab” 

On 11 March, weeks before Collins drove 
to Baltimore to meet the Lacks family for the 
first time, a team led by Lars Steinmetz at 
the European Molecular Biology Laboratory 
(EMBL) in Heidelberg, Germany, published a 
paper called “The genomic and transcriptomic 
landscape of a HeLa cell line’ (J. J. M. Landry 
et al. Genes Genomes Genet. http://dx.doi. 
org/10.1534g3.113.005777; 2013). News cover- 
age (see go.nature.com/inxzuw) noted the link 
to Henrietta Lacks, but not privacy concerns. 

Skloot, in a later article for The New York 
Times, made clear that family members were 
unhappy that — yet again — they had not been 
consulted. “I think it’s private information,” 
Henrietta’ granddaughter Jeri Lacks- Whye 
told Nature. “I look at it as though these are 
my grandmother’s medical records that are 
just out there for the world to see.” The EMBL 
team removed the data from public access, and 
hoped that a solution could be reached. 

As the controversy erupted, Nature was 
preparing to publish an even more detailed 
sequence of the HeLa genome, according to 
senior author Jay Shendure, a genome scientist 
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at the University of Washington in Seattle. 
His team, funded by the NIH, started decod- 
ing HeLa DNA in 2011, as part of an effort 
to develop new sequencing techniques. They 
also hoped that the genome would be useful 
for other researchers, a motivation shared by 
the EMBL team. They submitted their paper 
to Nature in November 2012. 

The paper's reviewers did not raise privacy 
concerns before recommending it for pub- 
lication; nor did Nature, Shendure says. He 
considered contacting the Lacks family before 
publication, and restricting access. “Figuring 
out how to reach out to the family was very 
much on the table when events overtook us.” 

After Skloot’s article on the EMBL paper 
came out in March, Collins learned about 
Shendure’s NIH-funded project. He saw an 
opportunity. He was already at work reform- 
ing the rules that govern research on human 
subjects. “It looked as if this was a moment to 
get everybody in the same room,’ he says. 

And so, on the evening of 8 April, Collins 
met a group of Henrietta Lacks’ children and 
grandchildren for dinner and discussion at the 
Johns Hopkins campus. Along with Collins was 
his chief adviser and two mediators from the 
university. Skloot phoned in to the meeting, 
which was to be the first of three. 

Collins says that family members told him 
how unsettling it had been to learn about HeLa 
cells decades after Lacks died. They peppered 
Collins with questions about genetic sequenc- 
ing and how Lacks’ cells had been used. “I felt 
like I was taking “Biology 101,” says Lacks- 
Whye. Collins told them that Shendure’s team 
might have identified the genetic change that 
made their grandmother's tumour so aggres- 
sive and HeLa cells so prolific. The NIH later 
put the family in touch with experts in clinical 
genetics who told them what health informa- 
tion could be gleaned from the genome, and 
the NIH offered to help family members have 
their own genomes sequenced and interpreted. 

Collins says that he did not pressure the fam- 
ily to agree to the release of the HeLa genome 
data; he was open to leaving the NIH-funded 
work unpublished. But he told the family that 
it would be impossible to keep the data locked 
away. NIH researchers had calculated that 
400 genomes’ worth of HeLa data are already 
publicly available in piecemeal form — parts 
of projects such as the Encyclopedia of DNA 
Elements — and that scientists in thousands of 
labs around the world could easily and cheaply 
sequence the cell line themselves. 

Some Lacks family members raised the 
possibility of financial compensation, Collins 
says. Directly paying the family was not on the 
table, but he and his advisers tried to think of 
other ways the family could benefit, such as 
patenting a genetic test for cancer based on 
HeLa-cell mutations. They could not think of 
any. But they could at least reassure the fam- 
ily that others would not make a quick buck 
from their grandmother’s genome, because 


THE LACKS LEGACY 


Story of the world’s most widely used 
human biological research tissue. 


| 1951 | Biopsy of Henrietta Lacks’ 


tumour collected without her knowledge 
or consent. HeLa cell line soon established. 


The journal Obstetrics and 
Gynecology names Henrietta Lacks as 
HeLa source; word later spreads in Nature, 
Science and mainstream press. 


Lacks family members learn 
about HeLa cells (pictured). Scientists later 
collect their blood to map HeLa genes, 
without proper informed consent. 


| 1996 | Lacks family honoured at 


the first annual HeLa Cancer Control 
Symposium, organized by former student 
of scientist who isolated HeLa cells. 


HeLa genome published without 
knowledge of the family, which later endorses 
restricted access to HeLa genome data. 


the US Supreme Court had this year ruled 
that unmodified genes could not be patented. 
Lacks-Whye says that the family does not want 
to dwell on money — and that her father has 
often said he “feels compensated by knowing 
what his mother has been doing for the world”. 

In the end, the family decided that it wanted 
the data to be available under a restricted- 
access system similar to the NIH dbGaP data- 
base, which links individuals’ genetic make-up 
to traits and diseases. Researchers would apply 
for permission to acquire the data and agree 
to use them for biomedical research only, and 
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would not contact Lacks family members. A 
committee that includes family members will 
handle requests, and papers that use the data 
will recognize Henrietta Lacks and her kin. The 
first of these papers, the NIH-funded paper, is 
published in this issue (see page 207). 

In discussing HeLa cells and the agreement 
forged with the family, Collins and others often 
use the word “unique”. No other human sample 
matches the cell line for ubiquity, notoriety or 
celebrity (Oprah Winfrey is producing a film 
based on the story). The NIH does not see the 
deal with the family as a guide to handling other 
human samples. “It’s not going to be a prece- 
dent,’ says Collins’ chief adviser Kathy Hudson. 

But it will probably inform other cases, she 
adds. The US government is redrafting rules 
that govern the relationship between federally 
funded researchers and participants. New rules 
aim to give subjects greater say in how their tis- 
sues and personal data are used. “Going for- 
ward, I’m very much of the mind that the most 
appropriate way to show respect for persons is 
to ask? Collins says. “Ask people, ‘Are you com- 
fortable having this specimen used for future 
genomic research for a broad range of biomedi- 
cal applications?’ — if they say no, no means no” 

As for the myriad other tissues out there 
that were obtained without consent, Collins 
says that it would slow science too much to 
ban their use. Laura Rodriguez, a policy offi- 
cial at the NIH who works on guidelines for 
genome sequencing, says that there is a low 
risk of donors of such samples being identi- 
fied. But in January, researchers working on a 
genomics project showed that it is possible to 
identify anonymous participants — and their 
families — by cross-referencing their genomes 
with genealogy DNA databases. 

Hank Greely, a biotechnology lawyer at Stan- 
ford University in California who has advised 
the EMBL group, says the HeLa agreement is a 
“good solution’, but applying it to other uncon- 
sented cell lines and data would be unwieldy 
and impractical. “The one thing we really 
should be doing is making sure everything we 
collect from here into the future is acceptable.” 

Lacks-Whye has similar advice. Researchers 
can make major breakthroughs, she says, while 
still respecting the wishes of patients and their 
families. “Have them involved,’ she says. “That's 
not only for HeLa sequences, but anybody who 
participates in research.” m SEE EDITORIALP.121 
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X-rays emitted by swirling disks of matter hint at the speed at which a supermassive black hole spins. 


ASTROPHYSICS 


Spin rate of black 
holes pinned down 


Calculation offers way to probe galactic evolution. 


BY EUGENIE SAMUEL REICH 


lack holes can be described by just 
B two fundamental characteristics: mass 

and spin. Astronomers have been able 
to measure the objects’ mass for decades, by 
looking for gravitational effects on the orbits 
of nearby stars. But measuring spin, which 
records the angular momentum of the matter 
that falls into the holes, has proved trouble- 
some, particularly for the supermassive black 
holes that lie at the centres of galaxies. No light 
emanates from the black holes’ spinning event 
horizons, so astronomers instead look for 
proxies that emit X-rays, such as the swirling 
disks of matter that feed into some holes. 

Such indirect spin measurements have now 
been made for 19 supermassive black holes for 
which the mass is also well known (see ‘Spin 
off’). On 29 July, astronomers reported that 
they had calculated the spin of another super- 
massive black hole, using a new technique that, 
although unproven, provides an alternative 
way to target the elusive quantity. “There’s a 
significant number of us who think we are get- 
ting a coherent picture of black hole spin,” says 
Andrew Fabian, an astronomer at the Univer- 
sity of Cambridge, UK. 

The conventional method used to meas- 
ure spin dates to 1995, although this has been 
controversial until recently. It relies on the 
detection of X-rays emitted from the corona, 


aspherical halo of hot, ionized gas that sits just 
above and below the plane of the black hole’s 
accretion disk. Some of these X-rays bounce 
off the disk and travel towards Earth. In them, 
astronomers can sometimes discern a promi- 
nent emission line characteristic of iron. The 
higher the black hole’s spin, the closer the 
accretion disk can get to the black hole’s event 
horizon, and the more the strong gravity can 
distort the iron line, spreading it over a wider 
range of X-ray energies. 

Scepticism about the method is beginning 
to lift. In February, astronomers published 
spin calculations (G. Risaliti et al. Nature 494, 


SPIN OFF 


Some supermassive black holes spin at more than 
90% of the speed of light, which suggests that they 
gained their mass through major galactic mergers. 
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449-451; 2013) that used data from NASAs 
NuSTAR mission, which was launched last year 
(see Nature 483, 255; 2012). Study leader Guido 
Risaliti, an astronomer at the Harvard-Smith- 
sonian Center for Astrophysics in Cambridge, 
Massachusetts, says that NuSTAR provides 
access to higher-energy X-rays, which has 
allowed researchers to clarify the influence of 
the black hole’s gravity on the iron line. These 
rays are less susceptible than lower-energy 
X-rays to absorption by clouds of gas between 
the black hole and Earth, which some had 
speculated was the real cause of the distortion. 

In the latest study, astronomers calculated 
spin more directly (C. Done et al. Mon. Not. 
R. Astron. Soc. http://doi.org/nc2; 2013). They 
found a black hole some 150 million parsecs 
away with a mass 10 million times that of 
the Sun. Using the European Space Agency’s 
XMM-Newton satellite, they focused not 
on the iron line but on fainter, lower-energy 
X-rays emitted directly from the accretion 
disk. The spectral shape of these X-rays offers 
indirect information about the temperature 
of the innermost part of the disk — and the 
temperature of this material is, in turn, related 
to the distance from the event horizon and the 
speed at which the black hole is spinning. The 
calculations suggest that, at most, the black 
hole is spinning at 86% of the speed of light. 

Study leader Chris Done, an astronomer at 
Durham University, UK, thinks her result casts 
doubt on spin measurements made using the 
iron line, because those results tend to come 
in at above 90%. “We're on the very edge of 
what we can do,” she says. “We have different 
methods and wed like them to agree.” Others 
argue that the differences in results may reflect 
a genuine variation between supermassive 
black holes, and suggest that spin may vary 
with mass, or over cosmic time. 

Much is at stake. If the spins of supermas- 
sive black holes are as high as some have found 
using the iron line, then these black holes are 
likely to have formed from rare, major merg- 
ers between colliding galaxies, in which a large 
quantity of material falls into the central black 
hole from one direction. If the spins are lower, 
as Done suggests, then the black holes may 
have formed from many minor mergers, with 
bite-sized lumps of material coming from vari- 
ous directions. The distribution of black hole 
spins could therefore inform researchers about 
the history of galactic evolution, particularly if 
astronomers can eventually chart the change in 
spin over cosmic time by looking at ever-more- 
distant black holes. 

Astronomers also want to understand 
whether spins power the jets of material that 
spew out from some black holes. But they 
cannot address these questions while dis- 
agreement remains over the spin measurement 
techniques, says Risaliti. He is optimistic that 
further X-ray observations will resolve the 
controversy. “It’s a long way to go, but this is 
the beginning,” he says. m 
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THE SANDS 


The scientific community is sharply divided 
over the proposed Keystone XL pipeline from 
Canada’s tar sands. 


BY JEFF TOLLEFSON 


biggest carbon bomb on the planet”. Famed US climate researcher 

James Hansen has warned that it would unleash a “monster”. And 
protestors have chained themselves to the White House fence, declaring 
that it would feed a nasty fossil-fuel addiction and enrich the oil industry 
while dooming the global climate. 

The object of all that ire, the Keystone XL pipeline, is designed to carry 
crude oil some 1,900 kilometres from the tar sands of Alberta, Canada, to 
the US Midwest, where it will link into a network of pipelines supplying 
refineries on the Gulf of Mexico. Proponents say that it would provide 
North America with a secure source of energy and reduce dependence 
on overseas oil. But for environmentalists frustrated by a stalemate in 
Congress and repeated failures to secure an aggressive international 
climate treaty, the pipeline has become a key battle — one that they hope 
will trigger a popular uprising against unbridled fossil-fuel development. 

The issue has also divided the scientific community. Many climate 


EB nvironmental activist Bill McKibben has called it the “fuse to the 
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The tar sands of 
Alberta supply 7% 
of the oil used in the 
United States. 


and energy researchers have lined up with 
environmentalists to oppose what is by all 
accounts a dirty source of petroleum: emis- 
sions from extracting and burning tar-sands 
oil in the United States are 14-20% higher than 
the country’s average oil emissions. But other researchers say that the 
Keystone controversy is diverting attention from issues that would have 
much greater impact on greenhouse-gas emissions, such as the use of coal. 

Some experts find themselves on both sides. “I’m of two minds,” says 
David Keith, a Canadian climate scientist who is now at Harvard Uni- 
versity in Cambridge, Massachusetts. “The extreme statements — that 
this is ‘game over’ for the planet — are clearly not intellectually true, but 
Iam completely against Keystone, both as an Albertan and somebody 
who cares about the climate.” 


SIGNIFICANT FIGURES 

The pipeline’s future rests with US President Barack Obama, who 
declared in June that Keystone would serve the national interest only if it 
“does not significantly exacerbate the problem of carbon pollution”. The 
debate now centres on the definition of ‘significantly, which requires 
a bit of context. 

Canada has an estimated 170 billion barrels of dense, viscous oil 
locked up in deposits of loose sandstone in Alberta. These tar sands, or 
oil sands, produced 1.8 million barrels of oil per day in 2012, and that 
figure is projected to nearly triple by 2030. More than two-thirds of that 
oil makes its way via pipelines to the United States, where it accounts for 
around 7% of US oil consumption. But the pipelines are reaching cap- 
acity, so companies looking to increase production must first figure out © 
how to get their product out of Canada. Keystone is just the first step and 3 
would eventually carry some 730,000 barrels per day from the tar sands # & 
to US refineries. To meet the industry’s production forecast, however, 9 
at least three more pipelines of comparable capacity will be needed. 

A draft environmental-impact statement by the US Department of 
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State found that halting the Keystone pipeline would have a minimal 
impact on the development of the tar sands, because oil companies 
would find other ways to get their product to market. In the short term, 
that means shipping by rail, which would increase emissions. But the 
state department’s conclusions have come under fire from environmen- 
talists as well as the US Environmental Protection Agency, which urged 
the state department to conduct “a more careful review” of the economic 
analysis for its final assessment. 

There is cause for scepticism about the alternatives to the Keystone 
pipeline. Although companies are increasingly using rail to ship con- 
ventional oil out of North Dakota, where production is booming, that 
option is less attractive for the oil sands. Rail cars carrying tar-sands oil 
cost more to run because they must be heated, and they cannot carry 
as much, because the oil is much heavier than typical crude. Trains 
from Alberta must also travel much farther to reach the Gulf coast. 
The combination of factors increases prices by about US$20 per barrel. 


BARRELLING FORWARD 

Other proposed pipelines could transport the oil, but they also face chal- 
lenges. Last week TransCanada, the company behind the Keystone XL 
plan, said that it wants to build a $12-billion pipeline to the country's 
Atlantic coast, but environmentalists concerned about oil spills have 
vowed to block that route. And British Columbia’ ruling party tooka 
stance in May against a planned pipeline to Canada’s Pacific coast. 

“Really what you are left with is that the Keystone pipeline is the 
only route forward” to increase production from the tar sands, says 
Susan Casey-Lefkowitz, director of the international programme for the 
Natural Resources Defense Council in Washington DC. And because 
increasing production necessarily increases emissions, she says, Key- 
stone “fails the president's climate test”. 

But the matter is far from settled. The oil industry has already begun 
shipping conventional oil out of Alberta by rail, and IHS Cambridge 
Energy Research Associates, a consulting firm in Englewood, Colorado, 
projects that tar-sands producers will employ rail if no pipelines are built. 
“We think it’s economic, and we think it will 
grow in the absence of pipelines,” says Jackie 
Forrest, senior director for global oil at the firm. 
Others argue that the industry would push to 
build other pipelines if Keystone were to fail. 

The environmental impact must also be 
weighed against issues such as safety and 
energy security, says Andrew Weaver, a cli- 
mate scientist at the University of Victoria in 
British Columbia, who was elected this year 
to the province's parliament as a member of 
the Green Party. He refuses to weigh in on 
whether the pipeline should be built, say- 
ing that the decision rests with the United 
States. But he calls the argument for North 
American energy security “quite compelling” And he cites the crash 
last month of an oil-transport train in Quebec, as evidence that the 
potential for human error is higher with rail than with pipelines. “I 
think it’s mad that we are burning all of our oil,” Weaver says, “but we've 
got to put it into perspective.” 

In 2012, Weaver sought to do just that. He and a student calculated 
what would happen to global temperatures if the tar sands were fully 
developed. The proven reserves — those that could be developed 
with known technologies — make up roughly 11% of the global total 
for oil, and Weaver’s model suggested that full development would 
boost the average global temperature by just 0.03 degrees Celsius 
(N. C. Swart and A. J. Weaver Nature Clim. Change 2, 134-136; 2012). 
Weaver says that the initial focus should be on coal, which he found 
would have 30 times the climate impact of oil if the world burned all 
proven coal reserves. 

“As a serious strategy for dealing with climate, blocking Keystone 
is a waste of time,” says David Victor, a climate-policy expert at the 
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University of California, San Diego. “But as a strategy for arousing 
passion, it is dynamite.” 

Well aware that their future prosperity may depend on it, producers as 
well as the government of Alberta — which hauled in about $4.3 billion 
in royalties from the oil sands in the 2011-12 fiscal year — say that they 
are cleaning up operations there. Within Canada much of the concern 
has focused on local pollution from the mining operations, which spew 
exhaust and toxic chemicals into the atmosphere and leave behind large 
wastewater ponds. But companies are also working to trim the overall 
greenhouse-gas emissions associated with production. 

Environment Canada, the country’s environment agency, claims that 
tar-sand producers reduced their emissions by 26% per barrel of oil 
between 1990 and 2010. But emissions are poised to increase in the com- 
ing years as companies probe deeper into the earth. The industry is up 
against geology. Having depleted many of the tar-sand deposits accessible 
through surface mining, companies are exploiting deeper formations by 
injecting steam into the rock layers to liquefy and produce the oil. Pro- 
ducing steam requires natural gas, which can increase emissions by up to 
30% compared with the surface-mining process (see ‘Dirty oil’). Those 
deeper deposits now account for roughly half of oil-sands production, 
but they make up 80% of proven reserves, so their share of production 
will only climb higher. 

Alberta is investing more than $700 million in a $1.35-billion 
demonstration project that would capture and bury up to 1.2 million 
tonnes of carbon dioxide annually from a facility that upgrades bitumen, 
the tar-like product from the oil sands, into crude oil for shipment. On 
its own, however, that project would not have a significant impact on 
emissions. More generally, Alberta enacted a law in 2007 requiring all 
major emitters in the province to reduce their annual emissions inten- 
sity — a measure of emissions relative to production — by 12% or face a 
levy of $14 for every tonne of emissions in excess of that target. That levy 
has raised $376 million for clean-energy investments to date, nearly $79 
million of which has been invested in projects related to the tar sands. 

Yet there is no way to cleanly produce oil from the tar sands — or 
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from anywhere else. “The way that you drive down emissions in the 
transportation sector is by driving less, by becoming more efficient, and 
then by changing your fuels,’ says Michael Levi, an energy-policy fellow 
at the Council on Foreign Relations in New York. 

Many researchers who have sided with environmentalists on Key- 
stone acknowledge that the decision is mostly symbolic. But in the 
absence of other action, says Harvard’s Keith, it is important to get peo- 
ple involved and to send industry a message that the world is moving 
towards cleaner fuels, not dirtier ones. 

For Ken Caldeira, a climate researcher at the Carnegie Institution for 
Science in Stanford, California, it is a simple question of values. “I don't 
believe that whether the pipeline is built or not will have any detectable cli- 
mate effect,” he says. “The Obama administration needs to signal whether 
we are going to move toward zero-emission energy systems or whether we 
are going to move forward with last century's energy systems.” = 


Jeff Tollefson covers energy and environment for Nature in New York. 
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Engineered structures with bizarre optical 
properties are set to migrate out of the 
laboratory and into the marketplace. 


om Driscoll would be happy if he 

never heard the phrase “Harry Potter- 

style invisibility cloak again. But he 

knows he will. The media can’t seem 

to resist using it when they report 

the latest advances in metamaterials 

— arrays of minuscule ‘elements’ that bend, scat- 
ter, transmit or otherwise shape electromagnetic 
radiation in ways that no natural material can. It is 
true that metamaterials could, in principle, route 
light around objects and render them invisible, not 
unlike the cloak of a certain fictional wizard. And 
many metamaterials researchers are trying to make 
cloaking a reality, not least because the military has 
eagerly funded the development of such capabilities. 
However, if such applications ever come to pass 

it will be decades from now. Technologies closer to 
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commercialization are of more interest to Driscoll, a 
physicist who oversees metamaterials commercializa- 
tion at Intellectual Ventures, a patent-aggregation firm 
in Bellevue, Washington. Applications such as cheaper 
satellite communications, thinner smartphones and 
ultrafast optical data processing are “where metama- 
terials are poised to make a huge impact’, he says. 
Researchers still face some daunting challenges, 
he adds — notably, finding cheap ways to fabricate 
and manipulate metamaterial elements on a scale 
of nanometres. But the first metamaterial-based 
products are expected to come onto the market ina 
year or so. And, not long after that, Driscoll expects 
that average consumers will start to enjoy the ben- 
efits, such as faster, cheaper Internet connectivity on 
board planes and from mobile phones. Such appli- 
cations, he says, will move from being the stuff of 
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peoples’ fantasies “to becoming things they 
can't contemplate living without”. 

The first laboratory demonstration ofa meta- 
material was announced in 2000 by physicist 
David Smith and his colleagues at the University 
of California, San Diego’. Following up on theo- 
retical work done in the 1990s by John Pendry 
of Imperial College London, these researchers 
showed that an array of tiny copper wires and 
rings had a negative refractive index for micro- 
waves — meaning that microwave radiation 
flowing into the material is deflected in a direc- 
tion opposite to that normally observed (see 
‘Wave engineering’). That triggered intense 
interest in metamaterials, in part because the 
ability to bend radiation in such a way had 
potential for creating invisibility cloaks. 

Since then, Smith and others have explored 
a host of variations on the metamaterial idea, 
often looking to manipulate radiation in ways 
that have nothing to do with a negative refrac- 
tive index. They have also moved beyond static 
arrays, devising techniques to change the way 
the elements are arranged, how they are shaped 
and how they respond to radiation. The result- 
ing materials can do things such as turn from 
opaque to transparent or from red to blue — all 
at the flick of a switch. 


MARKET MOVERS 

In January, Smith, now at Duke University in 
Durham, North Carolina, took on a concur- 
rent role as director of metamaterials com- 
mercialization efforts at Intellectual Ventures. 
“T felt that the time was right, and we didn't 
need to do any more science for some of these 
things,” he says. 

A test case may come as early as next year. 
Kymeta of Redmond, Washington, a spin-off 
from Intellectual Ventures, hopes to market a 
compact antenna that would be one of the first 
consumer-oriented products based on meta- 
materials. The relatively inexpensive device 
would carry broadband satellite communi- 
cations to and from planes, trains, ships, cars 
and any other platform required to function 
in remote locations far from mobile networks. 

At the heart of the antenna — the details of 
which are confidential — isa flat circuit board 
containing thousands of electronic metamat- 
erial elements, each of which can have its 
properties changed in an instant by the device's 
internal software. This allows the antenna to 
track a satellite across the sky without having 
to maintain a specific orientation towards it, 
the way a standard dish antenna does. Instead, 
the antenna remains still while the software 
constantly adjusts the electrical properties of 
each individual metamaterial element. When 
this is done correctly, waves emitted from 
the elements will reinforce one another and 
propagate skywards only in the direction of the 
satellite; waves emitted in any other direction 
will cancel one another out and go nowhere. 
At the same time — and for much the same 
reason — the array will most readily pick up 


signals if they are coming from the satellite. 

This technology is more compact than 
alternatives such as dish antennas, says Smith. 
It offers “significant savings in terms of cost, 
weight and power draw”. Kymeta has already 
performed demonstrations of this technology 
for investors and potential development part- 
ners. But Smith cautions that the company has 
yet to set a price for the antenna and that it 
must still work to bring production costs down 
while maintaining the strict performance 
standards that regulatory agencies demand 
for any device communicating with satellites. 

Kymeta has shared so few details of its 
antenna that researchers say it is hard to offer 
an evaluation. But Smith is highly regarded 
in the field. If Kymeta brings the product to 
market, it may first offer its antenna for use 
on private jets and passenger planes. If buyers 
respond well, the company hopes to incorpo- 
rate the technology into other product lines, 
such as portable, energy-efficient satellite- 
communication units for rescue workers or 
researchers in the field. 

In January, Smith's group turned heads when 
it announced its demonstration of another 
metamaterial device: a camera that can create 
compressed microwave images without a lens 
or any moving parts”. One important applica- 
tion of the device might be to reduce the cost 
and complexity of airport security scanners. 

In their current form, these scanners have to 
physically sweep a microwave sensor over and 
around the subject. This produces an unwieldy 
amount of data that has to be stored before it 
is processed into an image. The Duke group’s 
device requires very little data storage. It takes 
numerous snapshots by sending beams of 
microwaves of multiple wavelengths across the 
target at about ten times per second. When the 
microwaves are reflected back by the subject, 
they fall on a thin strip of square copper meta- 
material elements, each of which can be tuned 
to block or let through reflected radiation. The 
resulting pattern of opaque and transparent ele- 
ments can be varied very rapidly, with each con- 
figuration transmitting a simplified snapshot of 
a scanned object into a single sensor. The sensor 
measures the total intensity of radiation from 
each snapshot, then outputs a stream of num- 
bers that can be digitally processed to recon- 
struct a highly compressed image of the subject. 

This is admittedly just a first step: demon- 
strations carried out so far have been crude 
affairs restricted to two-dimensional images 
of simple metallic objects. Expanding it to 
three-dimensional images of complex objects 
remains a challenge. But if that challenge can 
be overcome, says Driscoll, airports could retire 
the bulky, expensive, slow booths that currently 
constitute security checkpoints, and instead use 
a larger number of thin, inexpensive metamate- 
rial cameras hooked up to computers. Such a 
shift, Driscoll says, could extend security scan- 
ning to rooms, hallways, and corridors through- 
out airports and other sensitive facilities. 
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gi ing 
Metamaterial elements scatter incoming 
radiation in very precise ways. They can be any 
shape; common examples include spheres, rings, 


crosses and chevrons. Their electromagnetic 
properties can often be changed by software. 


The spacing 
between the 
elements can vary, 
but is always less 
than the wavelength 
of the radiation. 


Collectively, the 
array of elements 
functions similarly 
to a hologram, 
shaping the 
radiation in ways no 
natural material can. 


Example: 

Negative index 

of refraction 
Metamaterials can be 
engineered to bend 
radiation in a direction 
opposite to that observed | 
in ordinary materials. 


Positive 
refraction 


Negative 
refraction 


Application: 
nvisibility cloak 
A cloak made of a negative- 
index metamaterial can 
bend radiation around an 
object inside it, making that 
object seem invisible. 
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In the meantime, a key research goal for 
Smith and his group is the development of 
robust and marketable metamaterial devices 
that are not restricted to radio, microwave 
or infrared wavelengths. If the technologies 
could be made to work with visible light, they 
would become much more useful for applica- 
tions such as fibre-optic communications or 
consumer-oriented cameras and displays. 

“Tt won't be easy,” cautions Stephane 
Larouche, a member of Smith’s research team 
at Duke. For any given type of radiation, he 
explains, metamaterials can wield their exotic 
powers only if the elements are smaller and 
more closely spaced than the wavelength of 
that radiation. “So the shorter and shorter the 
wavelength we wish to use, the smaller each 
metamaterial element must be,’ says Larouche. 

In the microwave and radio regions of the 
spectrum, this is relatively easy: wavelengths 
are measured in centimetres to metres. But an 
optical metamaterial’s elements would have to 
measure considerably less than a micrometre. 
That is not impossible: today’s high- 
performance microchips contain 
features only a few tens of nanome- 
tres across. But unlike those essen- 
tially static features, says Larouche, 
the metamaterial elements in many 
applications would need to incorpo- 
rate ways for software to change their proper- 
ties dynamically as needed. “Too often we have 
gorgeous ideas,’ he says, “but we have no way 
of fabricating them” 


FLAT FOCUS 

Despite these difficulties, workable designs for 
optical metamaterials have begun to emerge. 
One was published in March’ by a group 
working under Nikolay Zheludey, a physicist 
at the University of Southampton, UK, who 
directs a research centre focused on metama- 
terials at Nanyang Technological University 
in Singapore. The team’s device can greatly 
alter its ability to transmit or reflect optical 
wavelengths by means of nanometre-scale, 
electrically controlled metamaterial elements 
etched from gold film; it could one day serve 
as a switch in high-speed fibre-optic commu- 
nications networks. 

Meanwhile, because it is so hard to make and 
control three-dimensional metamaterial arrays 
at optical scales, some researchers are focusing 
on two-dimensional ‘metasurfaces: In August 
2012, a group led by Federico Capasso at Har- 
vard University in Cambridge, Massachusetts, 
unveiled a flat metamaterial lens that can focus 
infrared light to a point in much the same way 
as a glass lens’. “I don’t want to claim absolute 
novelty in this,” Capasso says, “but I believe we 
are the first group to so clearly put flat optics on 
the agenda for commercial applications.” 

A conventional lens relies on refraction 
to bend light to a point by passing it through 
varying thicknesses of glass. Capasso’s lens 
passes light through a two-dimensional array 


of gold metamaterial elements carved out of a 
60-nanometre-thick silicon wafer using elec- 
tron-beam lithography techniques developed 
for the microchip industry. The elements are 
fixed, so cannot be tuned after fabrication. But 
by selecting a specific size and spacing during 
the manufacturing process, physicists can shape 
light of a chosen wavelength in exactly the right 
way to make it come to a point. 

Capasso warns that commercial applications 
of such flat lenses are probably still a decade 
away. This is partly because silicon is a rigid 
and fragile substrate for etching the elements; 
researchers are looking at more robust and 
flexible alternatives that would be easier to 
handle on the production line. They are also 
looking for better ways to control the carving 
of the nanoscale elements, which has to be 
done very precisely. 

But once the technology is mastered, says 
Capasso, one obvious application is in smart- 
phone cameras. Lenses, along with batteries, 
are among the most stubborn limiting factors 


“Metamaterials are 
poised to make a huge 


impact.” 


in smartphone thickness, he says, speculating 
that a smartphone incorporating a flat camera 
lens could potentially be made “as thin as a 
credit card”. The flat lens also avoids aberra- 
tions that plague glass lenses, such as the col- 
oured ‘fringes’ created by the inability to focus 
all wavelengths to the same point. This means 
that Capasso’ flat lens could also be used to 
make better, aberration-free microscopes. 

As good as they might ultimately be, the flat 
lenses would still be subject to the diffraction 
limit, which dictates that no conventional lens 
can resolve details much smaller than the wave- 
length of the light that illuminates its target. 
This limit averages about 200 nanometres for 
visible light. But metamaterials offer a means of 
fabricating ‘superlenses’ that could surpass such 
limits, allowing researchers to see sub-wave- 
length details of target objects such as viruses 
or the ever-changing structures in living cells. 

The key is to recognize that the missing 
details are still there, carried in ‘evanescent’ 
waves of reflected light that die away very rap- 
idly with distance from the illuminated object. 
Normally, these waves have effectively vanished 
before they can be captured and focused by a 
lens. But a metamaterial superlens designed 
to be placed within tens of nanometres of an 
object can pick up and magnify these waves. 

An early proof-of-concept superlens was 
demonstrated in 2005 by a group working 
under Xiang Zhang, a physicist at the Univer- 
sity of California, Berkeley’. Zhang’s group 
produced a simple metamaterial consisting ofa 
35-nanometre-thick layer of silver in a sandwich 
with nanoscale layers of chromium and plastic. 
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The team has since been working to refine the 
superlens concept; in 2007 the researchers 
advanced the idea by developing ‘hyperlenses’ 
from curved, nested layers of compounds such 
as silver, aluminium and quartz’, The lenses not 
only capture evanescent waves, but can also feed 
them into a conventional optical system. Ulti- 
mately, this could allow sub-wavelength details 
to be viewed through the eyepiece of a stand- 
ard microscope. But the complex structure and 
behaviour of hyperlenses makes them difficult 
to manufacture and use in this way. 


REVERSIBLE FOCUS 

By pairing conventional optics with super- 
lenses and hyperlenses based on meta- 
materials, Zhang hopes eventually to find 
applications far beyond microscopy. Just as 
these constructs can magnify sub-wavelength 
detail, they can also be run in reverse, direct- 
ing beams of light into sub-wavelength focal 
points — a property of potentially revolution- 
ary importance for fabricating minuscule 
structures using photolithography. 
If superlenses and hyperlenses can 
be harnessed for this purpose, the 
ultra-fine beams of light could be 
used to etch much smaller features 
than is possible today. This could 
greatly increase the density of data 
storage on optical drives, as well as the num- 
ber of components that can be crammed onto 
computer chips. 

Smith is cautious on that score, pointing out 
that hyperlenses and superlenses tend to dis- 
sipate substantially more of the light energy 
passing through them than other advanced 
lithographic techniques now in development. 
This, he says, makes them prime examples of 
“strong and compelling science that is not yet 
practical for any sort of product path” at optical 
wavelengths. But, he adds, Zhang’s efforts are 
“heroic experiments that illustrate the poten- 
tial of metamaterials in a fundamental way”. 

Zhang concedes that hyperlenses and super- 
lenses are not yet ready for prime time, but 
believes there is plenty of room for ongoing 
research to change that situation in the coming 
years. “The economic impact could be huge,” he 
says. “I am cautiously optimistic that metama- 
terials, superlenses and lithography will prove 
truly revolutionary. If people arent too short- 
sighted, what we can do with metamaterials will 
be limited only by our imaginations.” m 


Lee Billings is a freelance writer based in New 
York. 
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Henrietta Lacks’ family gather around a historical marker dedicated to her in Virginia in 2011. 


Family matters 


Kathy L. Hudson and Francis S. Collins discuss how and why the US National 
Institutes of Health worked with the family of Henrietta Lacks, the unwitting source 
of the HeLa cell line, to craft an agreement for access to HeLa genome data. 


values in the medical-research commu- 

nity — public data-sharing and respect 
for research participants — collided when 
the genome of the ubiquitous cell line HeLa 
was published’ and posted in a public data- 
base. Controversy ensued. The full sequence 
data could potentially uncover unwanted 
information about people whose identity 
is widely known: the family of the woman 
from whom this immortal line was derived 
62 years ago, Henrietta Lacks. 


E March, two of the most deeply held 


So, since March, the US National Institutes 
of Health (NIH) in Bethesda, Maryland, has 
worked closely with Lacks’ family. Together, 
we have crafted a path that addresses the 
family’s concerns, including consent and 
privacy, while making the HeLa genomic 
sequence data available to scientists to 
further the family’s commitment to biomed- 
ical research. 

The agreement that we reached goes into 
effect this week. We hope that it, and its gen- 
esis, will spur broader discussions regarding 
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consent for future use of biospecimens, with 
a goal of fostering true partnerships between 
researchers and research participants. 


MEDICAL HISTORY 

In 1951, physicians at Johns Hopkins Hospital 
in Baltimore, Maryland, took a biopsy from 
Henrietta Lacks, a 31-year-old African 
American woman who had an aggressive 
form of cervical cancer. This biospecimen 
was taken without her permission or knowl- 
edge; US regulations requiring consent 


8 AUGUST 2013 | VOL 500 | NATURE | 141 


> were still decades away. The tissue sample 
gave rise to the first human cancer-cell line 
that could grow endlessly in culture, called 
HeLa. Henrietta died later that year, but her 
cells live on. Today, more than 60 years later, 
scientists around the world use HeLa cells for 
research on almost every disease. The story of 
Lacks’ unwitting contribution to science, and 
the proud and poignant legacy it left for her 
descendants, is told in Rebecca Skloot's best- 
selling book, The Immortal Life of Henrietta 
Lacks (Crown, 2010), which is now being 
made into a film by Oprah Winfrey's produc- 
tion company. 

The German research team that in March 
this year posted the HeLa genome on open- 
access databases available through the 
European Bioinformatics Institute and the 
NIH’s National Center for Biotechnology 
Information did not violate any laws or rules. 
The action did, however, upset the Lacks 
family, and it drew criticism from many 
quarters”. The genome of these cells is not 
identical to Lacks’ original genome. The cells 
carry the genetic modifications that allowed 
them to form a tumour and grow prolifically; 
and their passage in cell culture for more 
than six decades has led to other structural 
anomalies. Nonetheless, the sequence can 
reveal certain heritable aspects of Lacks’ ger- 
mline DNA, and can thus be used to draw 
inferences, admittedly of uncertain signifi- 
cance, about her descendants. 

Within days, the European research- 
ers removed the sequence from the public 
databases, to allow time for consideration 
of alternative approaches. Meanwhile, an 
NIH-funded research paper by Andrew Adey 
and colleagues on the genome sequence of 
a second HeLa line was in press at Nature 
(published in this issue; see page 207)’. 
Nature mandates that authors of research 
papers make their data publicly available 
online. Something needed to be done — and 
in partnership with the Lacks family. 


WEIGHING THE OPTIONS 

Over the past four months, with help from 
Skloot and academic leaders at Johns 
Hopkins, we met members of the Lacks 
family in Baltimore on three occasions. At 
their request, some family members also met 
separately with an NIH genetic counsellor 
and medical-genetics expert to learn more 
about what the data might say about family 
members, and the implications of having it 
in the public domain. 

We talked at length with the family about 
the three options available for the full HeLa 
sequence data: first, making the sequence 
freely available, allowing anyone access at any 
time and for any use; second, placing the data 
inacontrolled-access database, which would 
require researchers to apply to the NIH to 
use the data in a specific study and to agree 
to terms of use defined by a panel including 
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members of the Lacks family; or third, 
withholding the sequence and not making it 
available at all for research — an option that 
the NIH would have had difficulty supporting 
or implementing, philosophically and legally. 
After much discussion, family members 
unanimously favoured the controlled-access 
option. This will allow them to be aware of 
and have a crucial role in the science that 
uses the HeLa genome. The NIH will help to 
implement this, but respecting the family’s 
preferences has required (and will continue to 
require) cooperation and patience by many — 
including scientists, publishers, funders and 
scientific societies. The authors and publish- 
ers of both genome 


“Non- papers’” have agreed 
identifiability —_ to submit their data 
is increasingly for controlled access 
illusory, (in the same way as for 
owing to many other non-HeLa 
technological genome sequences) 


through the NIH’s 
database of genotypes 
and phenotypes (dbGaP; see go.nature.com/ 
fduced). Likewise, NIH-funded research- 
ers who sequence other HeLa lines will be 
expected to deposit their data in the dbGaP. 
We hope that scientists whose work is supp- 
orted by other funders will do the same. 

Applications for access to the sequence data 
will be rapidly reviewed by a newly formed 
HeLa Genome Data Access working group 
at the NIH, on which two members of the 
Lacks family will serve. We believe that this 
plan reflects the true partnership between 
the Lacks family and the biomedical-research 
community. We also ask that all researchers 
who generate or use genomic data from HeLa 
cells include in their publications an acknowl- 
edgement of the contribution of Lacks and the 
continued generosity of her family, such as 
that in Adey and colleagues’ paper’. 

Of course, someone could still stitch 
together a reasonable representation of the 
HeLa genome from the estimated 1,300 giga- 
bytes of data already in public databases, 
which have been accumulating over the past 
25 years — and the family knows this. The 
family is also aware that any lab with the right 
equipment, and non-NIH funds, could derive 
the full sequence from scratch at any point 
and post it on a non-NIH website. How- 
ever, we urge the research community to act 
responsibly and honour the family’s wishes. 
Downloading the HeLa sequence through 
controlled access is the right and respectful 
thing to do. 

It is important to note, however, that we 
are responding to an extraordinary situation 
here, not setting a precedent for research 
with previously stored, de-identified speci- 
mens. The approach we have developed 
through working with the Lacks family is 
unique because HeLa cells were taken and 
used without consent, and gave rise to the 


advances.” 
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most widely used human cell line in the 
world, and because the family members are 
known by name to millions of people. 

The furore around HeLa cells has brought 
the absence of consent requirements for 
some biospecimen research to public atten- 
tion. Under current US federal guidelines, it 
is still possible to use specimens and to gener- 
ate whole-genome sequencing data without 
the knowledge or permission of the person 
providing the sample, as long as the biospeci- 
men meets the definition of ‘de-identified’ 
(see go.nature.com/2jrzvz). The administra- 
tion of President Barack Obama is undertak- 
ing fundamental reforms for the protection 
of human subjects in research. Among the 
factors motivating these reforms is the recog- 
nition that non-identifiability is increasingly 
illusory, owing to technological advances, 
especially in genomics and computing*”. In 
addition, the relationship between research- 
ers and participants is evolving: seeking 
permission emphasizes that participants are 
partners, not just ‘subjects. 

In July 2011, the US Department of Health 
and Human Services issued a notice request- 
ing public comment on how current regula- 
tions for protecting participants in research 
might be revised to be more effective (see 
go.nature.com/LL6es9). Among other ques- 
tions, the notice sought comment on whether 
the department should require consent for 
future research using samples, identified or 
not. The notice also sought input on the use of 
broad consent for unspecified future research 
use of specimens. The question assumed that 
specimens that were collected before a change 
in regulations would be governed by the old 
rules. On the basis of those public comments, 
the department is preparing a new proposal. 

It is fitting, given the priceless contri- 
butions that Henrietta Lacks has made 
to science and medicine, that her story 
is catalysing enduring changes in policy. 
These should afford future generations of 
research participants the protections and 
respect that were not in place during Lacks’ 
lifetime. m SEE WORLD VIEW P.123 


Kathy L. Hudson is deputy director for 
science, outreach and policy at the National 
Institutes of Health (NIH) in Bethesda, 
Maryland. Francis S. Collins is director of 
the NIH. 

e-mail: kathy.hudson@nih.gov 
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CHINA PHOTOS/GETTY 


Steel mills and other heavy industries amplify carbon emissions in Inner Mongolia, while the products are consumed in more affluent parts of China. 


A low-carbon 


road map for China 


Recycling, renewables and a reinvigorated domestic energy market will allow China 
to lead the world in low-carbon development, say Zhu Liu and colleagues. 


( ‘oe is a major force behind anthro- 
pogenic carbon emissions and their 
mitigation. The world’s leading 

primary energy consumer in 2012, China 

devoured almost half of all coal produced. 

The nation accounted for one-quarter of 

global carbon dioxide emissions in 2011 

and 80% of the world’s rise in CO, emissions 

since 2008 (ref. 1). 

Facing international pressures to curb 
its CO, releases, as well as a tight domestic 
fossil-energy supply and high levels of air 
pollution, China has implemented a bold 
national strategy for energy conservation 
and emissions mitigation. The country 
plans to reduce its carbon intensity (CO, per 
unit of gross domestic product, or GDP) to 
55-60% of 2005 levels by 2020. 

This can be achieved only if China becomes 


a low-carbon economy. With powerful 
regulatory control, we believe that the nation’s 
energy appetite could drive the development 
and use of low-carbon technologies, in which 
China could become a world leader. We iden- 
tify the major challenges in such a transition, 
and propose a five-pronged strategy to get 
China onto this low-carbon pathway. 

First, China must move away from coal 
and boost recycling and renewable energies. 
Second, emissions-mitigation indicators, 
such as energy-efficiency targets, should 
be set relative to physical output (such as 
tonnes of steel production) rather than to 
economic growth. Third, regional energy 
supply and demand must be balanced. 
Fourth, energy prices should be linked to 
market mechanisms rather than set centrally 
by authorities. And fifth, China must reduce 
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air pollutants alongside CO, emissions. 

China has made great progress in cutting 
carbon emissions in the past decade. In its 
11th five-year plan (2006-10), the govern- 
ment set goals to reduce energy intensity 
(energy consumption per unit of GDP) by 
20% on average across all provinces by 2010. 
Thousands of inefficient power plants and 
factories were closed to meet the targets’, 
saving the equivalent of 750 million tonnes 
of coal and 1.5 billion tonnes of CO, (5% of 
global CO, emissions in 2010). 

The government’s 12th five-year plan 
(2011-15) calls for a 16% reduction in energy 
intensity and for a 17% reduction in carbon 
intensity. Each region has been allocated 
mandatory targets. Such cuts would save 
about 1.4 billion tonnes of coal between 2006 
and 2015, reducing CO, emissions by more 
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than 3 billion tonnes (roughly 60% of US 
emissions in 2010). Asa result, air-pollution 
levels would fall. 


LOW-CARBON CHALLENGES 
Challenges remain in curbing fossil-energy 
use and emissions while maintaining 
economic growth. Although two-thirds of 
China’s provinces met their intensity targets 
for 2006-10, CO, emissions rose nationwide 
in that period by 50% as the economy grew’. 
Efficiency targets were met by expanding 
scale of production. When pressure to reach 
the targets became so great that several prov- 
inces instituted blackouts, many factories 
turned to inefficient diesel power generators, 
leading to a national diesel shortage in 2010. 
Infrastructure construction has been the 
major driver of China's rapid economic and 
emissions growth since 2002. As a result, the 
economy relies on carbon-intensive indus- 
tries (see Economic growth). From 2005 to 
2011, many of China’s industries grew faster 
than its GDP, which rose by 87% (constant 
price at 2005 value). Thermal power genera- 
tion grew by 90%, steel production by 135%, 
cement by 96% and vehicle production by 
223% (ref. 4). In 2008, the growth was further 
exacerbated when China initiated a 4-trillion- 
renminbi (US$600-billion) economic stim- 
ulus plan, of which 85% was for building 
infrastructure. Today, China accounts for 
much of the world production of crude steel 
(45%), cement (60%), primary aluminium 
(44%), coke (64%) and coal (50%). Almost all 
of these products are consumed domestically. 
China's energy- and emissions-intensity 
targets are expressed as ratios of energy use 
or total emissions to GDP. There are thus 
two ways to achieve the targets: by upgrad- 
ing equipment and industrial processes to use 
less energy and to drive down emissions, or 
by expanding the scale of production, thereby 
boosting GDP. Both strategies, but especially 
scale expansion, have contributed to China's 
improved energy intensity. However, they 
have resulted in much higher emissions over- 
all. For instance, from 2002 to 2009 China’s 
coal-fired power plants improved their 
energy intensity by 10%, but because their 
production capacity more than doubled, total 
emissions from the sector also doubled’. 
Current emissions targets may exagger- 
ate regional development inequalities by 
outsourcing the mitigation cost from rich to 
poor. Some of China’s poorest regions rely on 
carbon-intensive industries, such as cement 
and steel, and such regions have per capita 
emissions approaching those of the United 
States (see ‘China emissions’). To support 
economic development, these regions have 
been allocated small reduction targets. But 
most products from these poorer areas are 
consumed in more affluent regions’. 
For example, in 2010, about 20% of carbon 
emissions from Inner Mongolia, a poor 
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region in the north of the country, were from 
the production of electricity that was exported 
to other provinces. Two-thirds of the region’s 
processed metals, half of its chemical prod- 
ucts and 43% of its cement were also sent to 
more developed areas on the coast. 

Beijing and Shanghai import about 70% 
and 33% of their electricity, respectively®. In 
this way, the cities avoid emitting 50 million 
tonnes and 38 million tonnes of CO,, respec- 
tively. If each region’s emissions, and those 
embodied in traded products, were allocated 
to final consumers, Inner Mongolia would 
have exceeded its energy-intensity reduc- 
tion commitments by around 40% in 2010, 
whereas Shanghai and Beijing would have 
failed to achieve theirs. 

The immaturity of China’ electricity grid 
results in power shortages and inefficiencies. 
Many new power plants remain unconnected 
to the national grid, owing to poor coordina- 
tion between local governments and the State 
Grid Corporation of China, which builds and 
manages all grids in the country. One-third 
of Inner Mongolia’s power capacity (100 bil- 
lion kilowatt hours, or kWh) remains unused 
each year for this reason. 

Because the prices of fuels such as coal are 
set by and fluctuate with the market, but the 
cost of electricity consumption is fixed by 
the central government, power plants slow 
their production rate when fuel prices are 
high. This is inefficient and causes blackouts. 
In 2010, China produced only about half of 
its capacity of 6,220 billion kWh. 


LOW-CARBON LEAPFROGGING 

China’s economy can be decarbonized only by 
reducing fossil-energy demand and emissions 
together. Recycling construction materials 
could reduce total energy intensity by as 
much as 90%. China recycled about 70 mil- 
lion tonnes of steel scrap in 2008, and scrap 
has the potential to replace 80% of iron ore 
as a resource for primary steel production by 
2050 (ref. 7). Schemes encouraging ‘urban 


ECONOMIC GROWTH 


China’s manufacturing output has risen rapidly 
since 2005 (relative levels shown). 
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mining’ of scrap and exchanging by-products 
among regional factories are needed. 

China leads the world in renewable energy, 
having invested $68 billion in 2012 — more 
than one-fifth of the global total for that year. 
The country’s installed renewable capacity of 
300 gigawatts (GW) in 2011 was already twice 
the US capacity for the same year (146 GW). 
China’s wind turbines and hydropower 
stations were the world’s most productive in 
2011, generating 70 billion kWh and 720 bil- 
lion kWh, respectively. 

Yet China is producing more renewable 
technologies than it can use. In 2012, the 
country’s manufacturing capacity for solar 
photovoltaic cells reached 40 GW. But it 
produced only 23 GW, which accounts for 
60% of world’s total annual production (37 
GW). Less than 10% of these domestically 
produced photovoltaics were installed in the 
country — in part owing to the lack of grid 
connections. The rest were exported. Further 
investments are urgently needed to extend the 
electricity grid to remote rural regions, where 
solar power is a solution for electrification, 
and to explore other markets for the surplus 
supply of panels. 

There is great potential for low-carbon 
energy technologies in China. Wind power 
alone could meet the entire projected 
increase in electricity demand up to 2030. 
Introducing 640 GW of wind capacity 
(costing about $900 billion) over the next 
20 years will reduce carbon emissions by 
30% in the period®. Using China’s waste 
gutter oil (13.7 million tonnes in 2010), 
which is refined from discarded cooking 
oil, for biomass fuel could reduce CO, emis- 
sions by some 90 million tonnes per year’. 
Such a reduction would be equivalent to 15% 
of the total emissions reduction from 1990 
to 2008 by the 39 industrialized countries 
under Annex B of the Kyoto Protocol — the 
international treaty that commits parties to 
cutting greenhouse-gas emissions. 

In the meantime, cleaner, non-renewable 
options, such as natural gas and nuclear 
power, could provide a buffer during the low- 
carbon transition. Through further exploita- 
tion of coal-bed methane and by improving 
connectivity of domestic and international 
gas pipelines, natural-gas consumption in 
China could swell to 250 billion cubic metres 
by 2020, with double-digit growth from 
2010 to 2020. Switching from coal to gas would 
simultaneously reduce air pollutants, such 
as sulphur dioxide and nitrogen oxides, 
as well as CO,. 

With 17 reactors in operation, nuclear 
power currently accounts for 1% (13 GW) 
of China’s electricity production capac- 
ity. Another 28 nuclear plants are under 
construction and more technologically 
advanced reactors are planned. The total 
nuclear capacity is set to rise to 80 GW by 
2020, 200 GW by 2030, and 400 GW by 2050. 


SOURCE: REF. 4 
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CHINA EMISSIONS 


Carbon dioxide release levels vary across regions. Poorer areas such as 
Inner Mongolia have high emissions owing to heavy industries. 
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To measure and promote progress, energy 
and emissions targets must be considered 
separately to economic performance. Physi- 
cal intensity indicators, such as emissions per 
unit production of steel, should be used rather 
than relative economic intensity indicators, 
such as emissions per unit of GDP. China has 
set a coal-consumption cap of about 3.9 bil- 
lion tonnes by 2015. Similarly, a national 
carbon-emissions cap should be introduced, 
leveraged by energy taxes and allowances. 

A carbon budget that considers both emis- 
sions and offsets from carbon sinks should be 
introduced for CO, inventories. This would 
encourage forest planting and waste manage- 
ment through carbon-credit schemes. The 
carbon sink created by China’s enhanced 
afforestation and territorial ecosystem might 
have taken up as much as 15% of Chinas fossil- 
fuel CO, from 2002 to 2007 (ref. 10). 

Regional compensation mechanisms 
would accelerate technology transfer 
between Chinese provinces. Targets for 
industrial sectors, rather than regions, 
would lessen geographic disparities. Regions 
should count emissions according to elec- 
tricity consumption rather than by produc- 
tion. Mitigation responsibilities should be 
required throughout the supply chain of 
energy-intensive enterprises, from company 
headquarters in rich areas to factories in poor 
ones. Rigorous environmental standards 
need to be strictly applied. 

China’ first domestic carbon market for 
permits to discharge CO, opened in June this 
year in Shenzhen, with permits for 20,000 
tonnes of CO, traded in the first day. Full 
marketization with moderate administrative 


4D 


intervention in China's fuel consumption will 
be crucial for the success of its domestic cap- 
and-trade scheme, which is currently being 
tested in seven provinces and municipalities, 
affecting an annual 1.5 billion tonnes of CO, 
emissions. The system will be implemented 
nationwide in 2016. 

Through the Kyoto Protocol, China is 
hosting more than 2,000 projects (half of 
the world total) under the Clean Develop- 
ment Mechanism, which can offset about 
600 million tonnes of CO, from China. The 
domestic cap-and-trade system is expected 
to cover 1 billion tonnes of CO, per year by 
2015 (about 10% of China’s total CO, emis- 
sions) and generate billions of US dollars of 
government revenue. 

A successful cap-and-trade scheme 
requires reliable carbon data, a transparent 
carbon market and fair credit allocations. 
China's central government should com- 
pile and verify the emissions inventories; 
coordinate, monitor and report on the market 
measures; define the reduction baselines and 
certify emissions-reduction credits. 

The wealthiest 5% of China’s population 
accounts for 25% of electricity use. The 
government should pioneer other economic 
mechanisms, such as a carbon tax, that 
target consumption. As a first step, China 
has implemented a household ‘incline block 
tariff’ since July 2012, whereby electric- 
ity purchased beyond a certain limit has a 
higher price. Similar taxes should be intro- 
duced for other consumables (such as vehicle 
fuels) and the revenue used to subsidize 
low-carbon products (such as electric 
cars), renewable-energy development and 
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energy infrastructure construction. 

Because CO, emissions and air pollution 
largely stem from fossil-fuel use, regional con- 
trol strategies for both should be integrated. 
China implemented regulations in June this 
year and plans to invest 1.7 trillion renminbi 
between 2013 and 2017 to limit urban air 
pollution, including particulates and ozone. 
These regulations suggest phasing out inef- 
ficient industrial boilers, limiting the expan- 
sion of emissions-intensive industries and 
enhancing regulations and market stimuli 
for green-energy development, which would 
accelerate the development of energy saving 
and environmental protection industries. 

China's energy-management system is 
being restructured under new government 
leadership. The streamlining associated with 
the annexing of the State Electricity Regula- 
tory Commission by the National Energy 
Administration should help to harmonize 
energy prices and policies across central and 
local governments and enterprises. A high- 
level governmental organization (such as 
the State Council) is also required to coordi- 
nate energy policies across agencies, such as 
the Ministry of Environmental Protection, 
National Development and Reform Com- 
mission, and provincial governments. 

By tackling these challenges, we believe 
that China can lead the global climate- 
change mitigation movement and create a 
pathway towards sustainable, low-carbon 
development. = 
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SOCIAL PSYCHOLOGY 


The gloat factor 


Dan Jones mulls over a study of why we enjoy the 


misfortunes of others. 


science writer Jonah Lehrer suffered a 
dramatic and public fall from grace. It 
became apparent that Lehrer had recycled 
the words of at least one other writer and had 
even invented quotes from Bob Dylan in his 
most recent book, Imagine: How Creativity 
Works (Houghton Mifflin Harcourt, 2012). 
Lehrer lost his new job at The New Yorker, 
his publishers pulled his books and writ- 
ers across the Twitterverse and blogosphere 
referred to him with scorn. Yet beneath the 
righteous indignation of his many critics 
lurked a sense of pleasure in seeing this 
young, hip, successful author cut down to size. 
Why do the misfortunes of others give us 
a lift? This is the question explored by social 
psychologist Richard Smith in The Joy of 
Pain, a breezy but serious exploration of the 
phenomenon. Smith's answer is that Schaden- 
freude — an emotion as ignoble as envy or 
spite, from the German for harm (Schaden) 
and joy (Freude) — pays psychological divi- 
dends by enabling us to feel better about our- 
selves, and our social worth and rank, through 
“downward comparison” with others. 
For Smith, Schadenfreude is grounded 
in our evolved social psychology. Life is a 
competitive game, with winners and losers 


IE the summer of 2012, best-selling 


in the search for sta- 
tus, mates and much 
else; and as far as 
natural selection is 
concerned, what mat- 
ters is not your abso- 
lute level of success, 
but how much better 
or worse you are doing 
relative to everyone 
else. So humans are 
keenly aware of how 
our attributes, skills 
and successes stack 
up against the game's 
other players and 
when we come off badly in these social com- 
parisons, our self-esteem takes a hit. 
Likewise, seeing those who are above us 
in social rank take a fall boosts our own 
relative standing and makes us feel good. 
Smith discusses experimental studies by 
social psychologist Wilco van Dijk and his 
colleagues that show that when people's self- 
esteem is challenged (by being given bad but 
false feedback on a test they've taken), they 
are more likely to take pleasure in hearing 
about a successful person coming undone. 
Similarly, the researchers showed that people 
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with pre-existing low self-esteem who take 
part in studies are prone to Schadenfreude. 

Smith describes many routes to Schaden- 
freude, from the relatively passive — such as 
comparing ourselves to people who are down 
and out, and exaggerating negative qualities 
(or dismissing positive qualities) of those 
more successful than us — to actively bring- 
ing about the misfortune of people we envy. 

He also notes a number of factors that 
prime us for Schadenfreude. It is likely to 
bubble up when we think that someone's 
misfortune is a case of just deserts, and never 
more so than when the person is revealed 
to be a hypocrite — such as the many evan- 
gelical preachers exposed as indulging in the 
behaviours they condemn in others. 

Envy also amps up Schadenfreude, as 
memorably articulated by writer Clive 
James’s poem “The book of my enemy has 
been remaindered [and I am pleased]’ (see 
go.nature.com/gr5kdf). Conflicts and com- 
petition between groups are likely to bring 
out Schadenfreude; think of sport, in which 
pleasure in the misfortune of rivals is socially 
acceptable. Psychologist Charles Hoogland 
and his colleagues, for example, have shown 
that committed basketball fans are pleased 
when rival team members suffer even severe 
injuries. And brain-imaging studies by psy- 
chologist Susan Fiske and others revealed that 
when fans of baseball team the Boston Red 
Sox witness their team beating arch-rivals the 
New York Yankees (or vice versa), their brains 
show more activation of reward systems than 
when their team beats a more neutral oppo- 
nent, underscoring the importance of com- 
petitive drive in Schadenfreude. Politics is yet 
another rich seam: recall the celebratory par- 
ties that erupted when former UK prime min- 
ister Margaret Thatcher died earlier this year. 

Schadenfreude fuelled by a combination 
of resentment and our divisive tendency to 
form exclusive, competitive groups can be 
especially potent, bringing out the darkest 
sides of human nature and leading people 
to actively engineer misfortune in other 
groups. Smith suggests that such a process 
plausibly had a role in Nazi propaganda, 
which was explicitly designed to arouse 
resentment, envy and enmity towards 
Jewish people, and so to offer a specious 
justification for their subsequent extreme 
mistreatment and incalculable suffering. 

Smith's portrait of this complex response 
combines experimental studies with many 
well-chosen examples drawn from politi- 
cal scandals, biographies, reality-television 
shows, literature, sitcoms, cartoons and the 
observations of comedians and satirists. 
The Joy of Pain is a real joy to read — and 
completely painless. m 


Dan Jones is a freelance science writer based 
in Brighton, UK. 
e-mail: dan.jones@multipledrafts.com 


8 AUGUST 2013 | VOL 500 | NATURE | 147 


BOOKS & ARTS 


r? 


*. 


DEVELOPMENT 


Food is handed out at a hospital in the Central African Republic. 


Starved for solutions 


Calestous Juma weighs up a call for a revolution to end world hunger. 


chronic undernourishment, despite 

humanity’s best efforts to improve 
agricultural productivity, create markets 
and boost nutrition. In Betting on Famine, 
sociologist Jean Ziegler sets out to provide a 
human rights-based approach to addressing 
world hunger. The book is a sweeping indict- 
ment of global injustice and provides ample 
facts and figures. “The destruction, every 
year, of tens of millions of men, women, and 
children from hunger is the greatest scan- 
dal of our era,’ says Ziegler, who was United 
Nations (UN) special rapporteur on the right 
to food from 2000 to 2008. 

His main thesis, which is in no way inno- 
vative, is that the world is capable of feeding 
12 billion people — 5 billion more than now 
exist. The main obstacle, in his view, is global 
inequality and corporate control of the food 
system. The solution, he says, is to return to 
the fundamental principles of the right to 
food, defined by the UN as having “regular, 
permanent and unrestricted access, either 
directly or by means of financial purchases, 
to quantitatively and qualitatively adequate 


Se 870 million people suffer from 
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and sufficient food corresponding to the cul- 
tural traditions of the people” 

Ziegler argues that access to food has been 
one of the most flouted human rights in his- 
tory. He attributes much of the reason for this 
to the dominance of the private sector and an 
unfair global trading system, underpinned by 
what he sees as neo-liberal dogma, such as the 
perceived benefits of privatizing public enter- 
prises. He argues that nothing short ofa revo- 
lution is needed to curb corruption among 
leaders in emerging nations most affected by 
famine, promote pop- 
ular resistance among 
social movements 
around the world, and 
make the right to food 
a policy priority in par- 
liamentary and other 
governance bodies. 

Betting on Famine 
disappoints for many 
reasons, one being that 


Betting on Famine: 


: ‘ Why the World Still 
it says nothing new. Goa Hungry 
Classics such as Susan jean Z\EGLER 


George’s How the The New Press: 2013. 


© 2013 Macmillan Publishers Limited. All rights reserved 


Other Half Dies (1976) have provided more 
incisive assessments of why famine has per- 
sisted despite increases in food production. 
Ziegler admits that much of what needs to 
be done has already been outlined in numer- 
ous UN documents. Furthermore, his book 
is primarily a diatribe against those in power; 
it offers little by way of example or inspiration 
on how to solve world hunger. Appealing to 
revolution is possibly the easiest of intellec- 
tual expeditions. Executing the task is much 
more complex and requires the involvement 
of the same corporations and governments 
that the book incessantly admonishes. 
There is an equally revolutionary alter- 
native that Ziegler does not acknowledge: 
empowering the poor by building their capac- 
ity to address hunger through improved agri- 
cultural practices, training of farmers, better 
infrastructure and access to markets. Follow- 
ing the 1974 coup in Ethiopia, for instance, 
Marxist leaders embarked on a peasant revo- 
lution aimed at overthrowing landowners 
in the hope that this would lead to the mod- 
ernization of agriculture. It did not work. But 
now the country’s government focuses on 
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promoting and expanding cooperation 
between farmers and the same corpora- 
tions that Ziegler wishes to send to the gal- 
lows. Partly because of improvements in 
agricultural production, Ethiopia's econ- 
omy has registered an average growth of 
8% per year in the past decade. 

Rights cannot be wished into existence. 
They need institutions to become realities. 
In 2010, Ethiopia created the Agricultural 
Transformation Agency (ATA), mirroring 
elements of Brazil's Agricultural Research 
Corporation (Embrapa), which has helped 
to bring technical support and credit to 
farmers. The ATA focuses on empowering 
farmers to become more entrepreneurial 
by helping them to improve productivity 
and participate in local and global mar- 


kets. Ethiopia is .. 
also nowa member Rig hts cannot 


of the Grow Africa be wished into 


consortium, which existence. 
includes private They need 
enterprises, the institutions 
African Unionand to become 
the World Eco- realities.” 
nomic Forum, and 
has pledged to invest more than US$3.5 
billion in African agriculture. China, India 
and Brazil, among other countries, are also 
actively tackling hunger with more inclu- 
sive approaches, accommodating all major 
players including private corporations. 

Ziegler rightly emphasizes the role of 
farmers, but fails to note how technical 
training can strengthen their political 
influence. Innovations such as the US 
land-grant university model, formal- 
ized 150 years ago to bring agricultural 
research, teaching and extension under 
one roof, played a key part in educating 
US farmers. The Green Revolution that 
helped countries such as India and Mexico 
to avert major famines relied heavily on 
scientific research, participation of the 
private sector and the upgrading of farm- 
ers’ skills. 

The right to food will continue to be 
a major global challenge as pressure on 
natural resources increases. But solutions 
will not come from traditional appeals for 
popular uprisings. They will come from 
increased inclusivity in partnerships, 
involving rather than punishing private 
corporations. To feed the hungry, the 
world needs new approaches that expand 
the practical use of human creativity, not 
more pleas for hollow revolutions. = 


Calestous Juma is professor of the 
practice of international development at 
Harvard Kennedy School in Cambridge, 
Massachusetts, and author of The New 
Harvest: Agricultural Innovation in 
Africa. 
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Godless chronicles 


Glenn Branch goes for a dip in the antitheologic. 


hen the producers of the 2008 
creationist film Expelled asked to 
interview PZ Myers, they misrep- 


resented the nature of their project and the 
purpose of the interview. By the time the film 
screened, Myers knew that his brief clip would 
portray the scientific establishment as dog- 
matically suppressing dissent from evolution. 
Ironically, when he arrived at the cinema, he 
was excluded — unlike a colleague who was 
also interviewed for the film. Myers repaired 
to anearby computer store to post a hilarious 
account of his expulsion on his widely read 
blog, Pharyngula. Expelled was quickly show- 
ered with unwelcome publicity as a result. 
Myers is a developmental biologist, 
who named his blog after the pharyngula 
stage of embryonic development — both, 


he jokes, are notable for the appearance of 


brain and jaw. Pharyngula is a freewheeling 
mix of explanations of developmental biol- 
ogy, denunciations of creationism, com- 
mentary on politics, feuds with critics and 
rivals, and the sort of in-jokes and recurrent 
features that typify the blogosphere, enliv- 
ened by a raucous chorus of commenters. 
Its slogan is: “Evolution, development, and 
random biological ejaculations from a god- 
less liberal”. 

A major theme of Pharyngula, and Myers's 
first book, The Happy Atheist, is what he 
views as the incompatibility of science and 
religion. In addition to excoriating various 
absurdities and atrocities that he associates 
with faith, such as the bad science deployed 
by anti-abortion zealots, Myers repeatedly 
asserts that science and religion are necessar- 
ily in conflict: “One is a method of analysis 
and experiment; the other is pretense and 
lies.” He is fierce with regard to the propo- 
nents of old-fashioned creation science 
(“trying to get their Old Testament super- 
hero to adhere to the rules of physics, chem- 
istry, biology, and ordinary common sense”) 
and the adherents of newfangled intelligent 
design (who “hide the bearded old sky god 
from the public eye”). He also castigates sci- 
entists who accept evolution while retaining 
their faith. 

Whatever Myers’s target, his weapons 
are taken from the arsenal of ridicule. He is 
in good company — writers such as Jona- 
than Swift and George Orwell spring to 
mind. Myers’s prose, although serviceable, 
ist quite in the same class, but sometimes 
reaches lyrical heights. Explaining his deci- 
sion to bury, rather than burn, unwanted 
books of scripture sent for his spiritual 
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— instruction, he exults 
“as nematodes writhe 
over the surfaces, 


Moe etching the words 
Happy | E . 
ar with slime and replac- 
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Pr ing the follies of dead 
MYERs 


men with the wisdom 
of worms”. Myers’s 
favourite weapon is the 
ee ta Atheist extended metaphor, 
Pantheon: 2013. Peployed Loren pase 
his targets as arbitrary 
and absurd. He wields 
it adroitly, comparing religious diversity to 
hat variety and theologians to courtiers fawn- 
ing over the Emperor’s new clothes. These 
conceits are often amusing and occasionally 
instructive, but the tactic is cheap. 

Whether infuriating or invigorating, ridi- 
cule is no substitute for a considered critique, 
and Myers often fails to do justice to his tar- 
gets. For example, his analysis of the idea 
that God guides evolution by acting unde- 
tectably at the quantum level, if amusing, is 
a popular rather than a scholarly treatment, 
and incorporates value judgements that 
are unsupportable by science. Myers might 
respond that his targets are too ridiculous to 
warrant anything more serious, but such a 
response presupposes, rather than compels, 
agreement. 

The chief problem with The Happy Athe- 
ist, however, is that it seems to break no new 
ground. By my count, Pharyngula posts pro- 
vide the basis for at least 26 of the 38 essays 
and 5 more are adapted from a talk he gave 
in 2010. 

Admirers and detractors alike will be dis- 
appointed by the book as a missed oppor- 
tunity for Myers to refine, systematize and 
extend his thoughts on science and religion. 
Itis not comparable with Jason Rosenhouse’s 
Among the Creationists (Oxford Univer- 
sity Press, 2012), Steve Stewart-Williams’s 
Darwin, God and the Meaning of Life (Cam- 
bridge University Press, 2010), or that 
‘summa antitheologica of our day, The God 
Delusion (Houghton Mifflin Harcourt, 2006) 
by Richard Dawkins. It was Dawkins, by the 
way, who was admitted to the screening of 
Expelled when Myers was excluded. Was 
Voltaire’s prayer, “O Lord, make our enemies 
ridiculous,’ ever better answered? m 
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National Center for Science Education in 
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Gain-of-function 
experiments on H7N9 


Since the end of March 2013, 
avian influenza A viruses of 
the H7N9 subtype have caused 
more than 130 human cases 

of infection in China, many of 
which were severe, resulting 

in 43 fatalities. Although this 
A(H7N39) outbreak is now 
under control, the virus (or one 
with similar properties) could 
re-emerge as winter approaches. 

To better assess the pandemic 
threat posed by A(H7N9) 
viruses, investigators from the 
NIAID Centers of Excellence 
in Influenza Research and 
Surveillance and other expert 
laboratories in China and 
elsewhere have characterized 
the wild-type avian A(H7N9) 
viruses in terms of host range, 
virulence and transmission, and 
are evaluating the effectiveness 
of antiviral drugs and vaccine 
candidates. However, to 
fully assess the potential risk 
associated with these novel 
viruses, there is a need for further 
research, including experiments 
that may be classified as ‘gain of 
function (GOF). 

Here we outline the aspects 
of the current situation that 
most urgently require additional 
research, our proposed studies, 
and risk-mitigation strategies. 

The A(H7N9) virus 
haemagglutinin protein 
has several motifs that are 
characteristic of mammalian- 
adapted and human influenza 
viruses, including mutations 
that confer human-type 
receptor binding and enhanced 
virus replication in mammals. 
The pandemic risk rises 
exponentially should these 
viruses acquire the ability to 
transmit readily among humans. 

Reports indicate that 
several A(H7N9) viruses from 
patients who were undergoing 
antiviral treatment acquired 
resistance to the primary 
medical countermeasure — 
neuraminidase inhibitors (such 
as oseltamivir, peramivir and 
zanamivir). Acquisition of 


H7N9 INFLUENZA RESEARCH 


Proposed gain-of-function experiments 


@ Immunogenicity. To develop 
more effective vaccines and 
determine whether genetic 
changes that confer altered 
virulence, host range or 
transmissibility also change 
antigenicity. 

e Adaptation. To assist with risk 
assessment of the pandemic 
potential of field strains and 
evaluate the potential of 
A(H7N9) viruses to become 
better adapted to mammals, 
including determining the 
ability of these viruses to 
reassort with other circulating 
influenza strains. 

@ Drug resistance. To 

assess the potential for drug 
resistance to emerge in 
circulating viruses, evaluate the 
genetic stability of mutations 
conferring drug resistance, 
and evaluate the efficacy of 
combination therapy with 


resistance to these inhibitors by 
A(H7N%9) viruses could increase 
the risk of serious outcomes of 
A(H7N%9) virus infections. 

The haemagglutinin proteins 
of A(H7N9) viruses have a 
cleavage site that is consistent 
with a low-pathogenic 
phenotype in birds. In the 
past, highly pathogenic H7 
variants (with basic amino-acid 
insertions at the cleavage site 
that enable the spread of the 
virus to internal organs) have 
emerged from populations 
of low-pathogenic strains 
circulating in domestic 
gallinaceous poultry. 

Normally, epidemiological 
studies and characterization of 
viruses from field isolates are 
used to inform policy decisions 
regarding public-health responses 
toa potential pandemic. However, 
classical epidemiological tracking 
does not give public-health 
authorities the time they need to 
mount an effective response to 
mitigate the effects ofa pandemic 
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antiviral therapeutics. Also, to 
determine whether A(H7N9) 
viruses could become resistant 
to available antiviral drugs, and 
to identify potential resistance 
mutations that should be 
monitored during antiviral 
treatment. 

© Transmission. To assess 

the pandemic potential of 
circulating strains and perform 
transmission studies to 
identify mutations and gene 
combinations that confer 
enhanced transmissibility in 
mammalian models (such as 
ferrets and guinea pigs). 

@ Pathogenicity. To aid risk 
assessment and identify 
mechanisms, including 
reassortment and changes to 
the haemagglutinin cleavage 
site, that would enable 
circulating A(H7N9) viruses to 
become more pathogenic. 


virus. To provide information that 
can assist surveillance activities 
— thus enabling appropriate 
public-health preparations to be 
initiated before a pandemic — 
experiments that may result in 
GOF are critical. 

Therefore, after review and 
approval, we propose to perform 
experiments that may result in 
GOF (see ‘Proposed gain-of- 
function experiments’). 

All experiments proposed 
by influenza investigators are 
subject to review by institutional 
biosafety committees. The 
committees include experts 
in the fields of infectious 
disease, immunology, biosafety, 
molecular biology and public 
health; also, members of the 
public represent views from 
outside the research community. 
Risk-mitigation plans for 
working with potentially 
dangerous influenza viruses, 
including the 1918 virus and 
highly pathogenic avian H5N1 
viruses, will be applied to conduct 
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GOF experiments with A(H7N9) 
viruses (see Supplementary 
Information at go.nature.com/ 
fstdy1). Additional reviews 

may be required by the funding 
agencies for proposed studies of 
A(H7N39) viruses. 

The recent H5N1 virus- 
transmission controversy 
focused on the balance of risks 
and benefits of conducting 
research that proved the ability 
of the H5N1 virus to become 
transmissible in mammals (see 
www.nature.com/mutantflu). 
These findings demonstrated 
the pandemic potential of H5N1 
viruses and reinforced the need 
for continued optimization 
of pandemic-preparedness 
measures. Key mutations 
associated with adaptation 
to mammals, included in 
an annotated inventory for 
mutations in H5N1 viruses 
developed by the US Centers for 
Disease Control and Prevention, 
were identified in human isolates 
of A(H7N9) viruses. Scientific 
evidence of the pandemic threat 
posed by A(H7N9) viruses, 
based on H5N1 GOEF studies, 
factored in risk assessments by 
public-health officials in China, 
the United States and other 
countries. 

Since the H5 transmission 
papers were published, 
follow-up scientific studies have 
contributed to our understanding 
of host adaptation by influenza 
viruses, the development of 
vaccines and therapeutics, and 
improved surveillance. 

Finally, a benefit of the 
H5NI controversy has been the 
increased dialogue regarding 
laboratory biosafety and dual- 
use research. The World Health 
Organization issued laboratory 
biosafety guidelines for 
conducting research on H5N1 
transmission and, in the United 
States, additional oversight 
policies and risk-mitigation 
practices have been put in place 
or proposed. Some journals 
now encourage authors to 
include biosafety and biosecurity 
descriptions in their papers, 
thereby raising the awareness of 


researchers intending to replicate 
experiments. 

The risk of a pandemic caused 
by an avian influenza virus 
exists in nature. As members 
of the influenza research 
community, we believe that the 
avian A(H7N9) virus outbreak 
requires focused fundamental 
and applied research conducted 
by responsible investigators 
with appropriate facilities and 
risk-mitigation plans in place. To 
answer key questions important 
to public health, research that 
may result in GOF is necessary 
and should be done. 

Ron A. M. Fouchier* Erasmus 
Medical Center, Rotterdam, 

the Netherlands. 
r.fouchier@erasmusmc.nl 
Yoshihiro Kawaoka* University 
of Wisconsin-Madison, 
Wisconsin, USA. 
kawaokay@svm.vetmed.wisc.edu 
*On behalf of 22 co-authors (see 
go.nature.com/fstdy1 for full list). 


Extra oversight for 
H7N9 experiments 


The US Department of Health 
and Human Services (HHS) 
announces a new review process 
for certain gain-of-function 
(GOF) experiments on the avian 
influenza A (H7N9) virus, some 
of which are proposed this week 
by influenza scientists (R. A. M. 
Fouchier et al. Nature 500, 
150-151; 2013). 

Specifically, before being 
undertaken using HHS 
funds, any experiments that 
are reasonably anticipated to 
generate H7N9 viruses with 
increased transmissibility 
between mammals by respiratory 
droplets will undergo an 
additional level of review by the 
HHS. 

The HHS review will consider 
the acceptability of these 
experiments in light of potential 
scientific and public-health 
benefits as well as biosafety and 
biosecurity risks, and will identify 
any additional risk-mitigation 
measures needed. The review 
will be carried out by a standing 
multidisciplinary panel of federal 
experts with backgrounds in 
public health, medicine, security, 
science policy, global health, risk 
assessment, US law and ethics. 


This approach, similar to that 
for certain H5N1 influenza virus 
experiments (see go.nature. 
com/vpmplf), allows the HHS 
to focus special oversight efforts 
on experiments of concern while 
allowing routine characterization 
and other fundamental research 
to proceed rapidly, thereby 
enabling a robust public-health 
response. 

GOF studies can provide 
important insights into how 
the A(H7N9) virus adapts to 
mammalian hosts, causes disease 
and spreads to other hosts, but 
they may also pose biosafety 
and biosecurity risks. To ensure 
that research involving H7N9 
virus is conducted safely and 
securely, the US Centers for 
Disease Control and Prevention 
recently re-examined the 
requisite biosafety conditions 
for conducting experiments 
involving H7N9 and, in June 
2013, issued interim risk- 
assessment and biosafety-level 
recommendations (see go.nature. 
com/gknn9a). 
Harold W. Jaffe Centers for 
Disease Control and Prevention, 
Atlanta, Georgia, USA. 
Amy P. Patterson National 
Institutes of Health, Bethesda, 
Maryland, USA. 
pattersa@od.nih.gov 
Nicole Lurie Department of 
Health and Human Services, 
Washington DC, USA. 
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Follow Obama’s lead 
and take a pay cut 


In considering the impact 

of the US budget sequester 
on science (see, for example, 
Nature 499, 147-148; 2013), 
I see no mention of salary 
reductions. A 5% reduction 
in the salaries of federally 
supported science staff, 
including administrative and 
agency personnel, would 
significantly reduce the need to 
cut science programmes. 

US scientists are not paupers: 
NASA scientists, for instance, 
are paid up to US$160,000 
a year, with generous fringe 
benefits. Plenty of professors 
at leading US universities 
make much more. A 5% cut 
to 12-month and summer 
salaries would not leave anyone 
destitute. 

Such a reduction would be 
much more effective than any 
presentations to Congress in 
showing that scientists care 
about their projects and are 
willing to share the pain of 
bringing US federal expenditure 
under control. 

I doubt that any scientist 
would refuse to accept a grant 
offered on the proviso that the 
salary rate be reduced by 5%, 
if the alternative were no grant 
at all. President Barack Obama 
took a pay cut to show the way. 


Let’s follow his lead. 
Peter Foukal Nahant, 
Massachusetts, USA. 
pvfoukal@comcast.net 


Three reasons for 
eco-label failure 


The fisheries industry promotes 
third-party eco-labels that 
signify sustainability, similar 

to those used in forestry and 
tourism (see Nature http://doi. 
org/nb5; 2013). In my view, 
these fail for three reasons. 

First, consumers care strongly 
that labels for health and quality 
standards are accurate because 
they affect individuals, but care 
much less about eco-labels 
because their effects are spread 
across society. 

Second, industries tend 
to use weak eco-labels in 
political games to avoid strong 
regulation. 

Third, ineffective eco-labels 
closely mimic accurate ones. 
Because there are no adverse 
consequences for consumers 
who cannot tell them apart, 

a high proportion of mimics 
persists. 

Eco-labels are thus no 
substitute for eco-laws. 

Ralf Buckley Griffith University, 
Gold Coast, Queensland, 
Australia. 
r.buckley@griffith.edu.au 
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Accurate maps of visual circuitry 


Such is the brain’s complexity that even small neural circuits contain hundreds of neurons making thousands of connections. 
Connectivity and optical analyses provide close-up views of two such circuits. SEE ARTICLES P.168, P.175 & LETTER P.212 


RICHARD H. MASLAND 


l nderstanding the biological machinery 
from which perception, action and 
thinking are built is not an easy 

undertaking. A big difficulty is that neurosci- 

entists must deal with a problem of spatial 
scale — one in which components range 
from nanometre-sized synaptic junctions 
between neurons to centimetre-long con- 
nections between brain regions — and study 
these scales simultaneously. Three papers’ ” 
in this issue attack these problems of scale. 

Two of them (by Helmstaedter et al.' and 

Takemura et al.*) use computational tech- 

niques to expand the field of neurons that 

can be encompassed in a high-resolution 
view. The third study (by Maisak et al.) 
combines genetic and optical methods to 
record the activity of neurons that until now 
have been impossible to monitor owing to their 
small size. All three take as a model the retina, 
the first image-processing element in the chain 
that leads to visual perception. 

The mammalian retina contains more than 
60 different types of neuron, each of which 
has a distinct morphology and carries out a 
different function’. Within the retina, photo- 
receptor cells sense light, and their output is 
processed by amacrine, horizontal and bipolar 
cells. Downstream, roughly 20 different types 
of retinal ganglion cell transmit the final coded 
signal — 20 different representations of the 
visual input — to the brain. Unsurprisingly, 
therefore, sorting out neuronal connectivity in 
the retina has been a daunting task. Helmstae- 
dter et al. (page 168) now report a connectome 
(a list of all synaptic connections) for an inner 
layer of the mouse retina. They achieve this by 
serial tissue sectioning and electron micros- 
copy, followed by digital reconstruction of 
cells within the virtual three-dimensional solid 
that results. 

The analysis reveals patterns of connec- 
tions that could account for the stimulus 
selectivity of two types of ganglion cell. More 
fundamentally, the reconstruction, which con- 
tains 950 neurons (Fig. 1a), allows definitive 
classification of the types of bipolar cell. With 
only a slight refinement, the new classifica- 
tion matches extremely well with the exist- 
ing understanding of these cells’, which was 


Retinal 
layer 


Figure 1 | The mechanism of motion discrimination in the visual system’. 
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a, Reconstruction of 24 out 


of 950 mouse neurons between two retinal layers, based on an electron microscopy data set. b, Photoreceptor 
cells slightly separated in space mediate inputs to Mil and Tm3 cells, through intermediary L1 and L2 cells. 
Their outputs converge on the T4 cells, which, because of the spatial separation of the inputs, can discriminate 
movements occurring in different directions. A similar mechanism is thought to occur in TS cells, although 
the intermediary cells are unknown. T4 and T5 cells are selectively responsive to light edges (ON) or dark 
edges (OFF), respectively. Thus, at the level of these cells, the visual input is broken down into eight separate 
components, each representing ON or OFF activity and one of four directions of motion (only forward 
motion is shown; purple arrows). This information is then recombined by the tangential cells, each of which is 
sensitive to one of four cardinal directions, and to both ON and OFF edges. For anatomically correct details of 


the diagram, see Figure 4 of ref. 2. 


based largely on the identification of molecular 
markers using light microscopy. Helmstaedter 
and colleagues’ work, however, adds value by 
providing enormously more precise descrip- 
tions of bipolar-cell structure and by effec- 
tively acting as a positive control, increasing 
confidence that analysis of the amacrine and 
ganglion cell types, which have resisted classi- 
fication by previous techniques, will be equally 
definitive. And this is just the beginning: once 
these cell types are classified, the same basic 
methods should allow the synaptic connec- 
tions among them to be deciphered. 
Takemura et al. (page 175) and Maisak and 
co-workers (page 212) report progress on a 
classic problem of neural computation — the 
detection of visual motion. Their test system 
is the eye of the fruitfly, an animal that must 
rapidly navigate during flight and that is espe- 
cially effective at dodging predators (doubters 
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are invited to swat one). It is easy to make sim- 
ple models of motion detection®’, but pinning 
the mechanism to precise neural events has 
been much harder. Whereas photoreceptor 
cells cannot detect direction, downstream neu- 
rons called tangential cells are robustly tuned 
to the direction of movement. Somewhere in 
between lies the neural mechanism that creates 
the directional discrimination, but the crucial 
neurons, called T4 and T5, are too small for 
ordinary electrical recording. Maisak et al. got 
around this difficulty by recording activity 
optically, using an indicator protein introduced 
into the cells by genetic techniques. 

The authors demonstrate that T4 and T5 
detect visual movement, with subsets of each 
being selective for one of four cardinal direc- 
tions: upward, downward, front to back, and 
back to front. Furthermore, these cells are sen- 
sitive to opposite visual contrasts — T4 cells 


FABIAN ISENSEE, JULIA KUHL/MAX PLANCK INST. MED. RES./REF. 1 


respond to light ON and so are sensitive to light 
edges, whereas T'5 cells respond to light OFF 
and are sensitive to dark edges. The authors’ 
genetic-knockout experiments not only con- 
firm this optical observation but also show 
that T4 and T5 are the sole pathways mediat- 
ing these functions, with no other cells being 
able to step in and carry the message. Thus, a 
fly initially breaks down moving visual inputs 
into a total of eight components: bright edges 
moving up, down, forward or backward, and 
dark edges moving along the same four axes. 

But how do T4 and T5 actually detect the 
direction of movement? Takemura and collab- 
orators’ fly connectome suggests a place to look 
for the answer. They show that just upstream of 
T4 lies a pair of neurons, termed Mil and Tm3, 
which report on narrowly separated points in 
visual space. Because of that separation, the 
pair could provide the inputs that T4 needs to 
discriminate direction (Fig. 1b). Using Maisak 
and colleagues’ recording technology, it may 
be possible to optically record from Mil and 
Tm3. If all goes well, this could bring the 
50-year search for the mechanism of direction 
selectivity to an end. 

The connectomic approach has now proved 
its importance for studying fly eyes and mouse 
retinas, but sceptics will still doubt that we can 
make the jump from these miniature neuronal 
circuits to ‘real brains’; the intrinsic circuits of 
the cerebral cortex are some ten times larger 
than those of the retina, and this spatial scale 
is dwarfed again by the distances that connect 
different brain regions. 

One obstacle is the need for very large tissue 
sections. Another difficulty is that of image 
segmentation, which is required for tracing 
thin neuronal processes through the thicket 
of neighbours in serial sections. Because digi- 
tal solutions have failed, the task is currently 
assigned to large teams of human observers, 
but this will be impractical on larger spatial 
scales. Improvements in fixation and stain- 
ing might make the processes more easily 
discriminable, and digital technology may yet 
save the day, because in principle any task that 
can be done by human observers can be done 
bya computer. Tracing processes is essentially 
a problem of pattern recognition, for which the 
technology is evolving rapidly. 

A final question concerns the cost-effective- 
ness of the connectomic approach. Is it to be 
the exclusive province of a few deep-pocketed 
laboratories? Here, the answer is clear: the 
researchers involved have stressed that the 
connectomic reconstructions will be public 
resources, which can be used for different pur- 
poses by anyone. To be useful, an archive may 
need to be accompanied by a user-oriented 
interface, because computer code this com- 
plex will be hard for workers other than its 
creators to use. The effort involved in creating 
and curating such a public resource should be 
worth it, because the archives could be the big- 
gest contribution of this work to neuroscience. 


Very many structural problems could be 
attacked using the same original material. m 


Richard H. Masland is in the Departments 
of Ophthalmology and Neurobiology, 
Harvard Medical School, Boston, 
Massachusetts 02114, USA. 

e-mail: richard_masland@meei.harvard.edu 


SOLAR SYSTEM 


NEWS & VIEWS | RESEARCH | 


. Helmstaedter, M. et al. Nature 500, 168-174 (2013). 
. Takemura, S. et al. Nature 500, 175-181 (2013). 
. Maisak, M. S. et al. Nature 500, 212-216 (2013). 
. Masland, R. H. Neuron 76, 266-280 (2012). 
. Wassle, H., Puller, C., Muller, F. & Haverkamp, S. 
J. Neurosci. 29, 106-117 (2009). 

6. Reichardt, W. in Sensory Communication 

(ed. Rosenblith, W. A.) 303-317 (MIT Press, 1961). 
7. Barlow, H. B. & Levick, W. R. J. Physiol. (Lond.) 178, 
477-504 (1965). 


OaARWNHrH 


Saturn’s tides control 
Enceladus’ plume 


Data obtained by the Cassini spacecraft show that the plume of ice particles at the 
south pole of Saturn’s moon Enceladus is four times brighter when the moon is 
farthest away from the planet than when it is closest. SEE LETTER P.182 


JOHN SPENCER 


n analysis of images of Saturn’s moon 
Enceladus taken from the Cassini 
Saturn orbiter, reported by Hedman 
et al.‘ on page 182 of this issue, shows that 
the output of the giant plume of ice particles, 
which jets out of fractures in the south polar 
region of the moon, is controlled by diurnal 
changes in the tidal stresses from Saturn*. The 
authors find a remarkably strong and simple 


*This article and the paper under discussion’ were 
published online on 31 July 2013. 


relationship between the brightness of the plume 
and Enceladus’ position in its orbit around 
Saturn, providing dramatic confirmation of 
predictions made in 2007 from a tidal-stress 
model’. 

Tidal forces from Saturn are the ultimate 
power source for the extravagant geological 
activity on Enceladus’, a small ice world just 
500 kilometres in diameter. Enceladus’ 1.37- 
day orbit around Saturn is slightly eccentric, 
as a result of the periodic gravitational influ- 
ence of the larger moon Dione. The daily vari- 
ations in the tidal stresses from Saturn due to 


Figure 1 | Enceladus’ ice-particle jets. The image shows jets of ice particles emerging from the four 
‘tiger-stripe fractures at Enceladus’ south pole, photographed in visible light by cameras on the Cassini 
Saturn orbiter in 2009. Hedman and colleagues’ analysis’ of hundreds of lower-resolution infrared 
images, taken by Cassini’s VIMS instrument, of the plume that results from the combination of these 
jets, reveals that the plume responds to daily changes in the tidal stresses from Saturn on the fractures, 
brightening when the fractures are in tension. 
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50 Years Ago 


A hazard to the animal that has 
always accompanied war is when 
persons unaccustomed to animals 
are drafted to animal units; in the 
Second World War this was a very 
common occurrence, because by 
then, so few persons had experience 
with horses. Many loved the job 
and did it well, but others were 
probably disappointed in a non- 
mechanical role and not very 
efficient. Mules reported to be 
vicious were often underworked 
and overfed: in any event for a tyro 
to be placed in charge of an army 
mule could be described as an 
experience. 

From Nature 10 August 1963 


100 Years Ago 


A very remarkable red-water 
phenomenon is at present 
observable in a small pond in 
Broad Oak Park, Worsley, near 
Manchester, just in front of the 
seventh tee on the golf course. The 
surface of the pond — at any rate 
at times — is covered in places 
with an almost blood-red scum, 
which seems to float on the surface 
film like fine dust. The scum 
sometimes assumes a greenish hue. 
Microscopical examination shows 
that it is due to the presence of 
immense numbers ofa large species 
of Euglena, the green chlorophyll 
of which ... is more or less replaced 
by red haematochrome ... Since 
writing the above I have been able 
to observe how the Euglenae reach 
the surface of the water. They 
evidently secrete some sort of slime 
in which they become entangled. 
Bubbles of oxygen gas, given off 

by the Euglenae in the presence 

of sunlight, are also caught in this 
slime, and when these reach a 
certain size they rise to the surface, 
trailing strings of slime, with 
numerous entangled Euglenae, 
after them. 

From Nature 7 August 1913 


this eccentricity distort Enceladus and dump 
gigawatts of frictional heat into its interior. This 
heat powers the ejection of the plume of water 
vapour and ice particles, discovered by Cassini 
in 2005, from four parallel warm fractures 
called ‘tiger stripes, which surround the moon's 
south pole (Fig. 1). 

Enceladus is thus one of the few places 
beyond Earth where we can watch geology 
happen in real time, giving us a primer for 
understanding other, less active, icy worlds. 
The ice particles in the plume are salty, hint- 
ing at a subsurface source of liquid water and 
a probable subsurface ocean‘, and the plume 
gases include complex hydrocarbons and other 
organic compounds’. The likely presence of 
liquid water and complex organic chemistry 
makes Enceladus especially intriguing as a 
potential habitat for extraterrestrial life, pro- 
viding additional motivation for investigating 
its interior. 

The 2007 modelling study’ noted that the 
rhythmic tidal stresses that power Encela- 
dus’ activity would also put the tiger-stripe 
fractures into alternating states of tension 
and compression during each orbit of Saturn. 
The study predicted that the majority of the 
tiger stripes would be in compression when 
Enceladus is closest to Saturn (periapse), and 
in tension when it is farthest away (apoapse). 
Tensional stresses would plausibly open 
pathways for the venting of plume gases and 
particles, and thus increase plume activity at 
apoapse. 

Cassini has so far made 20 close fly-bys of 
Enceladus to investigate its surface and interior 
and to sample its plume. But the plume is large 
enough and bright enough to be seen by remote 
sensing from longer range, permitting more 
frequent study both with Cassini's visible-wave- 
length cameras and with its Visual and Infrared 
Mapping Spectrometer (VIMS). Hints of the 
predicted variability in individual plume jets 
were seen in an early study’, using a relatively 
small number of images from Cassini’s cam- 
eras. Hedman and colleagues’ work, which is 
based on VIMS infrared images, was able to 
reach more definitive conclusions, thanks to 
systematic analysis of the spectrometer’s much 
larger data set of 252 plume images. 

The large data set allowed the team to tease 
apart orbit-related temporal variations in 
plume brightness from possible brightness 
variations caused by long-term changes and by 
illumination geometry (the micrometre-sized 
ice particles in the plume brighten greatly when 
backlit). The temporal variations revealed by 
this analysis are simple and dramatic — the 
plume is consistently about four times brighter 
when Enceladus is at apoapse than when it is at 
periapse, just as predicted by the 2007 model’. 
This provides strong evidence that the tiger- 
stripe fractures do indeed open and close 
each day in response to diurnal tidal stresses, 
controlling the plume in the process. 

Geology, dealing as it does with the complex 
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behaviours and long memories of materials in 
the solid state, tends to be a messy business. 
So it is always startling and instructive when 
simple patterns like this emerge. The strength 
of the plume-orbit relationship is a strong con- 
straint on how Enceladus works, and a valu- 
able guide to future research. For instance, the 
2007 tidal-stress model treats Enceladus’ crust 
as a thin elastic shell that is detached from the 
interior by a fluid layer, such as a global ocean. 
Global-ocean models have fallen out of favour 
for Enceladus, because it is difficult to keep a 
global ocean from freezing’, and a regional 
south polar ocean* is now considered more 
likely. Such refined models can now be tested 
and constrained by their ability to match the 
observed plume behaviour. 

Also, Cassini’s optical cameras, with 
50 times higher spatial resolution than that of 
VIMS, can provide more precise constraints 
on models of internal structure by tracking 
the variability of the individual jets that make 
up the plume’. Jet variability can thus provide 
knowledge of local stresses on the fractures 
from which the jets emanate, giving another 
means of investigating local subsurface condi- 
tions. The VIMS study itself hints at changes 
in plume ejection speed at different positions 
in the orbit, offering another handle on how 
the plume reaches the surface. No other world 
beyond Earth allows such detailed analysis of 
active geophysical processes. 

Enceladus displays other strikingly clear-cut 
patterns. For example, its several geological 
provinces, defined by age and style of surface 
deformation, are arranged with almost perfect 
symmetry around its spin axis and the direc- 
tion of Saturn”. Equally strange is the geo- 
metric simplicity of the four active tiger-stripe 
fractures, all roughly the same length (about 
130 km), and evenly spaced about 35 km apart. 
As with the plume behaviour, such simple pat- 
terns must point to important truths, but these 
other puzzles remain mysterious, and defini- 
tive explanations await future research. = 


John Spencer is at Southwest Research 
Institute, Boulder, Colorado 80302, USA. 
e-mail: spencer@boulder.swri.edu 
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BIOCHEMISTRY 


Curbing the excesses 
of low demand 


Metabolic processes are regulated by the relative need for the end product, but 
this control mechanism may fail if demand is very low. A safety mechanism that 
copes with low demand has been discovered in bacteria. SEE LETTER P.237 


ATHEL CORNISH-BOWDEN 


of metabolites is normally avoided by 

mechanisms that are similar in princi- 
ple to control systems in engineering. In this 
issue, Reaves et al.' (page 237) report what 
happens in mutant bacteria that lack a suppos- 
edly essential control mechanism to prevent 
excessive production of pyrimidine nucleo- 
tides — metabolites that act as building blocks 
for the synthesis of genes, but which are poten- 
tially toxic if allowed to accumulate. Instead 
of observing the expected accumulation, the 
authors discovered a mechanism in which 
excess nucleotides are eliminated. In so doing, 
they identified a plausible role for an enzyme 
whose physiological function had hitherto 
been unknown*. 

We all know that a kitchen sink is liable to 
overflow if the tap is left on with the plughole 
blocked. In most domestic sinks this danger is 
averted, at least partially, by having an overflow 
outlet near the top. But in more elaborate engi- 
neered systems, such as the domestic toilet, an 
overflow is avoided by means of negative feed- 
back: as soon as the water in the tank reaches a 
certain level, the inflow is switched off. Bacte- 
rial metabolism is in many ways similar, in that 
feedback mechanisms prevent the potentially 
harmful build-up of metabolites. The great 
explosion of interest in biological regulatory 
mechanisms in the 1960s followed the realiza- 
tion that negative feedback in metabolism 
operates in the same way~” as in engineered 
systems, by allowing the output of an end 
product to match demand‘ (Fig. 1a). 

From an evolutionary perspective, what 
matters in metabolism is not so much the 
‘overflow’ that occurs in the absence of bio- 
logical feedback controls, but the build-up 
of compounds that arises when the require- 
ment for a metabolite is too low to be handled 
adequately by feedback inhibition (Fig. 1b). 
This is difficult to test experimentally, but one 
can mimic the situation by studying mutant 
bacteria in which the feedback mechanism is 
suppressed. For example, mutants of Escheri- 
chia coli bacteria that are insensitive to feed- 
back inhibition of the production of the amino 


I: bacterial cells, the overproduction 


*This article and the paper under discussion? were 
published online on 31 July 2013. 


acid lysine have been studied’, and behaved as 
expected: the cells grew poorly, probably as 
a result of lysine accumulation. By contrast, 
Reaves et al. observed that the expected over- 
flow does not occur in mutant E. coli lacking 
feedback inhibition of the first enzyme in the 
biosynthetic pathway leading to pyrimidine 
nucleotides. Furthermore, the authors saw 
little effect of this lack of inhibition on the 
levels of the pathway’s end product, cytidine 
triphosphate (CTP), and they found that the 
growth rates of the bacteria were normal, 
except under energy-limiting conditions. 

CTP is essential for the synthesis of genes. 
Although it is produced from aspartate, an 
amino acid, it is about as different in struc- 
ture from aspartate as a metabolite can be. 
Enzymes are normally inhibited by molecules 
that resemble their substrates, and the discov- 
ery that the first step of CTP biosynthesis — 
the one that uses aspartate as a substrate — is 
inhibited by CTP was one of the factors that 
led to the recognition of feedback regulation 
in metabolism’. 

Surprisingly, Reaves and co-workers 
observed that neither CTP nor the biosyn- 
thetic intermediates leading to it accumulated 
in the mutant bacteria that lacked feedback 
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regulation. So why not, and what happened to 
the excess CTP produced? The authors found 
that it was degraded and excreted in the form of 
uracil (an RNA base). They describe this mode 
of regulation as “directed overflow metabo- 
lism”; others have called it “catabolic demand”. 
The degradation involved a phosphatase 
enzyme that is evolutionarily conserved, but 
whose function was previously unknown. 
Metabolic regulation is most easily analysed 
in economic terms of supply and demand**, 
especially given that the primary function of 
feedback inhibition is to regulate metabolite 
concentrations, rather than fluxes. Biologi- 
cal responses to the demand for metabolic 
end products are common in many systems, 
thus explaining the occurrence of cooperative 
feedback inhibition, such as that by CTP in 
pyrimidine biosynthesis. But the degradation 
of CTP to uracil observed by Reaves et al. is not 
a response to demand for uracil, because there 
is no particular demand for it. Instead, this deg- 
radation is a response to the excessive supply of 
nucleotides, and allows the concentrations of 
CTP and its biosynthetic precursors to be kept 
essentially constant when demand for CTP is 
low (Fig. 1c). Regulation according to supply 
occurs in biological detoxication pathways, and 
also in central metabolism. In particular, the 
mammalian liver does not phosphorylate glu- 
cose to form glycogen — the polymer used for 
energy storage in mammals — to satisfy its own 
demand for energy, but to maintain glucose 
homeostasis in other organs by maintaining a 
constant concentration of glucose in the blood’. 
The CTP-production pathway is therefore 
an example ofa system in which regulation by 
demand for end product occurs side-by-side 
with regulation by degradation and excretion 
of excess end products. Evolution cannot have 
generated and conserved the excretion mech- 
anism purely to compensate for the artificial 
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Figure 1 | Regulating metabolite production. a, In this general metabolic pathway, a starting 

material (A) is converted to an end product (E) via intermediates (B to D). When demand for E is 

normal, regulation of the pathway occurs by feedback inhibition of the first step — E inhibits the conversion 
of A to B. The sizes of the letters indicate the amount of each compound that accumulates. b, At very low 
demand, feedback inhibition is ineffective (broken line). Production of E is therefore excessive, and so it 

and the intermediates B to D accumulate. c, Reaves et al.' report that homeostasis of end products can be 
maintained when demand is very low if any excess is degraded and the resulting products are excreted. The 
authors discovered such a mechanism in the metabolic pathway that produces CTP, a pyrimidine nucleotide. 
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deletion of feedback in experiments, so this 
pathway must represent a back-up strategy 
for physiological states in which demand falls 
below the finite range within which feedback 
is effective. If so, then we should expect to find 
examples of the same sort of behaviour in other 
pathways. 

One other point will have occurred to alert 
readers. If excess production of pyrimidine 
nucleotides is overcome by converting the 
excess into uracil and excreting it, this implies 
that nutrients are not being used efficiently 
by the bacteria. That is why the mutant bac- 
teria grew as well as normal ones when energy 
supplies were abundant, but somewhat more 
slowly when energy was limited. = 
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The handtwork 


of tinkering 


A comparison of regulatory DNA sequences in humans, macaques and mice during 
embryonic limb development reveals thousands of sites of enhanced regulatory 
activity that are likely to have driven the evolution of our characteristic anatomy. 


PAUL FLICEK 


nderstanding what makes us human 

is a long-standing research goal that 

has been made more accessible in 
the genomic era. It is clear that specific features 
of human anatomy are encoded in the genome, 
and the extent of gene sharing among mam- 
mals suggests that most of these features result 
from gene-regulatory changes that occurred at 
some point in the human lineage. Writing in 
Cell, Cotney et al.' have provided the first 
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Figure 1 | The evolutionary origins of mammalian and human limb 
development. Cotney et al.' studied limb tissue from mice, macaques 
and humans at various stages of embryonic development (equivalent to 
human embryonic day 33 (E33), E41, E44 and E47) to identify active 
gene-regulatory sequences (enhancers and promoters). a, Their results 
show that, as development proceeds, the fraction of human enhancers 
that were active in human tissue only (human-gain enhancers) increased, 
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comprehensive map of the genomic regions 
responsible for the unique morphology of 
human limbs. They also show that the vast 
majority of gains of regulatory activity in the 
human lineage occur in genomic regions that 
trace back to the base of the mammalian tree. 
These results demonstrate once again that, as 
Nobel-prizewinning biologist Fran¢ois Jacob 
described it’, evolution is a tinkerer using the 
material at hand to create something workable. 

Compared with other mammals and pri- 
mates, humans have several specific limb 


characteristics, including the relative length- 
ening of the thumb and multiple adaptations of 
the foot to support walking upright. These and 
other developmental details are evident early 
in embryonic development, and so the relevant 
regulatory regions of our genomes must be 
active at specific developmental times to drive 
the appropriate physiology. 

Cotney and colleagues set out to map these 
regions in the human genome using com- 
parative regulatory genomics — the recently 
developed technique of performing the same 
functional experiment in matched tissues of 
multiple species to gain insight into regulatory 
evolution’, What makes this study so signifi- 
cant is that the experiments were conducted on 
embryonic limb tissue, with careful matching 
of multiple developmental stages in samples 
from humans, macaques and mice. At each 
stage, the authors identified genomic loca- 
tions at which the lysine amino-acid residue 
at position 27 of histone 3 proteins (histones 
are proteins around which DNA is wound) was 
modified by acetylation. This mark, referred to 
as H3K27ac, identifies* active gene-regulatory 
regions, such as promoters and enhancers. 
H3K27ac markings at regions regulating the 
expression of HOX genes, which have a deeply 
evolutionarily conserved role in development, 
provided a natural control, and inspire con- 
fidence that the profiled tissues were indeed 
from the same developmental stage across the 
three species. 

These maps of promoter and enhancer activ- 
ity show the progression of regulatory innova- 
tion with evolution. The limb bud at human 
embryonic day 33 was the earliest stage pro- 
filed by Cotney et al., and was also the stage 
at which the fewest promoter sequences (2%) 
showed H3K27ac marks in human samples but 
notat the related (derived from the same ances- 
tor) sequence in macaque tissues — such com- 
parisons were referred to as human-lineage 
gains. The latest developmental stage profiled 
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are, on average, younger. 
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demonstrating the specific needs of human limb development. The 

other categories are enhancers that were active in all three species (stable 
mammalian enhancers) and enhancers that are present but not active in 
all three species (orthologues). b, The study also reveals that many of the 
sequences that are stably active in the three mammals are also present in 
non-mammalian vertebrate genomes, whereas the human-gain enhancers 


by the authors, corresponding to human 
embryonic day 47, focused on the hand and 
foot plates; at this stage, Cotney et al. identified 
16% of promoters with human-lineage gains. 
These results are in accordance with the intui- 
tive idea that later developmental stages will 
demonstrate more human-specific morpho- 
logical features (Fig. 1a). In total, more than 
2,000 promoters and nearly 3,000 enhancers 
showed significant human-lineage gains at 
at least one assayed time point, demonstrating 
the dramatic orchestration of gene expression 
that is required for human limb development. 

In their comparisons between species, the 
authors classified enhancer regions into three 
categories: orthologous (an enhancer sequence 
that simply exists in all three species); those that 
exist and are stably marked with a consistent 
level of H3K27ac in all three species; and those 
that exist in all three species but that show a 
gain of H3K27ac in the human lineage. 

The characteristics of the sequences that fit 
into these classes, including their evolutionary 
age and conservation, provide some seemingly 
surprising results. First, the majority of human- 
lineage gains do not occur in highly conserved 
elements, although they do not seem to be 
completely unconstrained; rather, the great- 
est sequence conservation is seen in the stably 
marked regions. The stably marked and orthol- 
ogous regions are also significantly older, in 
evolutionary terms, than those in which activ- 
ity arises in the human lineage, with many also 
being present in non-mammalian vertebrates 
(Fig. 1b). Human-lineage gains, on the other 
hand, tend to be found in sequences that arose 
between the time that marsupials and placental 
mammals shared a common ancestor and the 
divergence of the placental lineage. A few of the 
sequences identified as human-lineage gains 
also overlap with regions previously identified 
as showing significant change since the diver- 
gence of the human lineage from chimpanzees’. 
Cotney and colleagues’ findings provide clues 
to the roles of 16 of these ‘human acceler- 
ated regions, and these may now be attractive 
candidates for further analysis. 

Exactly what is going on in the regions with 
enhanced regulatory activity in the human 
lineage is still not known. The sites had no 
obvious enrichments for specific transcrip- 
tion-factor-binding motifs. Also absent were 
specific repetitive elements, which have been 
shown to contribute to the regulatory rewiring 
of multiple mammalian lineages® and to bea 
contributing factor to the evolution of preg- 
nancy’. Understanding the molecular drivers 
at these and other certain-to-be-discovered 
regions of human regulatory change is both a 
formidable and an exciting challenge. 

There are limitations to Cotney and col- 
leagues’ analysis. For example, the quality of the 
genome sequences varies between the species, 
which may have contributed to the authors’ ina- 
bility to find orthologous regions in macaques 
for some human sites. Alternative explanations 


for this lack of orthologues, based on changes 
in copy number or human-lineage duplica- 
tions’, are also complicated by the absence or 
possible misassembly of these regions in the 
human genome sequence. Moreover, there are 
problems associated with mapping and analysis 
of the short DNA sequences resulting from 
the ChIP-seq analysis used by the authors to 
sequence regions containing H3K27ac marks. 
However, the study provides clear insight into 
the regulatory changes that help to make us 
human, and the authors have presented an 
extremely valuable map, connecting regulatory 
regions and gene-expression changes involved 
in human limb development. = 
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Solution proposed for 
ice-age mystery 


The ice sheets retreated 10,000 years ago during a peak in solar radiation, but this 
peak was no larger than previous ones. A modelling study suggests why the ice 
sheets were unusually vulnerable to melting at that time. SEE LETTER P.190 


SHAWN J. MARSHALL 


Mystery’, the seminal book by John and 
Katherine Imbrie, as an undergraduate stu- 
dent, and it played no small part in drawing me 
in to graduate studies on ice-age climate dynam- 
ics. Imbrie peére et fille describe the various 
strands of evidence establishing that Earth-Sun 
orbital variations are the main driver of glacial 
cycles: the recurring flow and ebb of ice sheets 
over the continents during an ice age. About 
40 such glacial cycles have shaped our planet 
over the past 2.6 million years (the Quaternary 
period), representing the most dramatic exam- 
ple of climate variability in Earth’s recent history. 
But there is one nagging problem: as much 
as Earth’s orbital wobbles seem to pace the 
advance and retreat of ice sheets, many aspects 
of ice-age climate dynamics remain a mystery. 
For one thing, those who model climate and 
ice sheets have not yet been able to simulate 
glacial cycles in a realistic way. Glacier advance 
into mid-latitudes requires severe cooling and 
increased snowfall compared with present- 
day conditions, to an extent that far exceeds 
the predicted response of the Earth system to 
‘cold orbital configurations in climate models. 
It is even more difficult to get rid of continen- 
tal ice sheets once they gain a foothold on the 
landscape. The modelling results reported by 
Abe-Ouchi et al.’ in this issue may provide a 
solution to these problems. 
The crux of the challenge in modelling gla- 
cial cycles is that Earth’s response to orbital 


L: encountered Ice Ages: Solving the 


forcing is entirely out of proportion. Changes 
in Earth’s tilt axis and the eccentricity of its orbit 
around the Sun give rise to geographical and 
seasonal changes in incoming solar radiation. 
The global annual impact of these variations 
is negligible, but what really matters to the ice 
sheets is the amount of sunlight at high north- 
ern latitudes during the summer melt season. 
Peak radiation in this region varies by up to 
100 watts per square metre because of orbital 
variations (Fig. 1a); this would certainly affect 
Arctic ice cover. However, integrated summer 
radiation, which is what counts in ice-sheet 
melting’, has deviated by less than 10% from 
present-day values over the most recent glacial 
cycle (Fig. 1b), and it is not obvious why this 
has elicited such a large shift in global climate. 

In fact, a host of positive feedbacks — cool- 
ing influences associated with increases in 
snow and ice cover — conspire to amplify the 
orbital signal and send the world careering into 
glaciations. It is difficult to overcome these 
cooling influences, and so orbital changes 
alone are not enough to trigger deglacia- 
tion. The most recent glaciation persisted for 
roughly 100,000 years, and the ice sheets sur- 
vived several periods of orbital warming before 
they finally destabilized and withdrew, start- 
ing about 20,000 years ago (Fig. 1c). At that 
time, summer solar radiation in the Northern 
Hemisphere increased, eventually peaking at 
about 6% above modern levels 10,000 years 
ago. But similar peaks occurred earlier during 
this period of glaciation, so what was different 
about this one? 
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Figure 1 | Solar radiation and ice-sheet coverage. 
a, The average daily incoming solar radiation 

(Q,) at 60° N from May to September varies as a 
result of fluctuations in Earth's orbit around the 
Sun, as revealed by these data for 116,000 years 
before present (116 kyr Bp; the inception of the last 
glacial period), 10 kyr Bp (the period of maximum 
insolation during the most recent deglaciation) 
and the present day’. b, The integrated summer 
insolation (Iyj,) at 60° N during the last glacial 
cycle reveals several peaks. c, Stacked benthic 
stable isotope ratios (6'°O) from the global ocean 
are a proxy for global ice-sheet volume during the 
glacial-interglacial cycle’. Comparison of b with 

c reveals that the insolation peak that triggered 
deglaciation was only as large as other insolation 
peaks that did not induce deglaciation. Abe-Ouchi 
et al.’ report that the geometry of North America 
and the time taken for bedrock to sink beneath ice 
sheets explain why deglaciation occurred when it did. 


Through asynchronous coupling of sophis- 
ticated climate and ice-sheet models, Abe- 
Ouchi and co-authors make a convincing case 
that the geometry of North America and the 
long response time of isostatic compensation 
— the change in height of Earth’s surface in 
response to ice-sheet formation and retreat 
—are the main agents that transform 19,000- 
year (19-kyr), 23-kyr and 41-kyr orbital vari- 
ations into a 100-kyr Earth-system response’. 
Ice sheets build up and flow southwards in 
both North America and Eurasia, taking 
many millennia to thicken and advance to 
their southern limits. Subglacial bedrock 
is depressed as underlying mantle material 
flows slowly outwards. At equilibrium, a 
3,000-metre-thick ice sheet undergoes about 
1,000 metres of subsidence’, but achieving 
equilibrium takes thousands of years. Simi- 
larly, land that was underneath the glacial ice 


sheets is still springing back. 

Isostatic subsidence is one of the few nega- 
tive feedbacks associated with glaciation: as an 
ice sheet slowly sinks, its surface lowers into 
a warmer climate, increasing the amount of 
melt and the area of the ice sheet exposed to 
melting. In Abe-Ouchiand colleagues’ simula- 
tions, this process becomes most effective late 
in the glacial cycle, when the North American 
ice sheets are thick and have advanced far 
enough south; because this takes a long time, 
North America is set for a 100-kyr response. 
By contrast, the geography of the Eurasian ice 
sheets (which are thinner and less extensive, 
and occur in a warmer climate) gives them 
less inertia, and so they are more sensitive to 
20- and 41-kyr orbital variations. 

This idea is not new — earlier modelling 
studies*® also implicated isostatic rebound 
as one of the main processes underlying the 
100-kyr glacial cycle. However, free-running 
simulations of the cycle have never before 
been achieved without invoking ‘exotic mecha- 
nisms’ — such as imposed ocean-circulation 
changes, dynamic ice-sheet destabilization or 
‘dusting’ of the ice sheets — that force deglacia- 
tion at the ‘right’ time. One innovative tech- 
nique that helps to capture the glacial cycle in 
Abe-Ouchi and colleagues’ analysis is the use 
of multiple snapshots from climate models, 
which provide information about different 
ice-sheet sizes, carbon dioxide concentrations 
and orbital configurations. This is necessary 
because the computational time required to 
run a sophisticated climate model over tens of 
millennia is still prohibitively long. 

However, some lingering mysteries remain, 


PALAEONTOLOGY 


such as the effects of the oversimplified treat- 
ment (or absence) of ice sheet—ocean inter- 
actions, basal flow (ice-sheet sliding and 
subglacial sediment deformation) and ice- 
stream processes in the authors’ simulations. 
Furthermore, ice-sheet melt rates are esti- 
mated only from air temperature, and are not 
based on energy-balance physics within the 
atmospheric model used by the authors. As 
climate and ice-sheet models become more 
sophisticated, we will see further refinement of 
these results. 

Moreover, Abe-Ouchi and colleagues’ 
findings do not explain the transition that 
took place 900,000 years ago, when the world 
moved from 41-kyr to 100-kyr glacial cycles. 
Isostatic time scales and North American 
geography did not change across this bound- 
ary, so another factor must have been at work. 
There are some layers yet to be explored in the 
mysteries of the ice age. m 
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Jurassic fossils and 
mammalian antiquity 


Two new Jurassic fossils yield conflicting reconstructions of the mammalian tree. 
These divergent genealogies have profoundly different implications for the origin 
and early diversification of mammals. SEE ARTICLE P.163 & LETTER P.199 


RICHARD L. CIFELLI & BRIAN M. DAVIS 


recent decades have greatly advanced our 

understanding of mammalian relation- 
ships and diversification’. Yet major points of 
disagreement remain over some of the basal 
branches of the family tree. There is little 
doubt that mammals, strictly defined’, were 
widespread and ecologically diverse by the 
middle of the Jurassic period, about 165 mil- 
lion years ago”. But when did they originate? 
A major sticking point is the inclusion (or not) 


, ossil discoveries and molecular studies in 


160 | NATURE | VOL 500 | 8 AUGUST 2013 


© 2013 Macmillan Publishers Limited. All rights reserved 


of certain poorly known early forms. Sub- 
stantial information is now provided by two 
separate discoveries, reported in this issue 
by Zhou et al.° (page 163) and Zheng et al.’ 
(page 199), of splendidly preserved fossils from 
China that date to between 160 million and 
165 million years ago. 

Both fossils, which include evidence of fur 
but lack complete skulls, have been assigned 
to the Haramiyida. This enigmatic group 
includes fossils dating back to the Late Tri- 
assic — that is, about 40 million to 50 mil- 
lion years before the appearance of undoubted 


mammals”. Intriguingly, haramiyidans have 
some rodent-like specializations of the teeth, 
and in this respect they vaguely resemble 
members of another extinct group, the Multi- 
tuberculata, which is well known and has 
consistently been placed firmly within the 
mammalian tree*’. Although the new fossils 
seem to be related, detailed examination 
reveals significant differences between the two 
animals — differences that lead the authors to 
contrasting views of mammalian relationships 
which imply quite different models for the 
origin and initial diversification of these iconic 
organisms. 

Zheng and colleagues describe Arboro- 
haramiya, a short-faced presumed omnivore 
or herbivore that was adapted for life in the 
trees (as implied by the name). Notably, the 
authors interpret the back of the jaw to be 
simple (although this is not clear from the 
photographs in the paper’s Supplementary 
Information), indicating that the bones of the 
middle ear had assumed a typical mamma- 
lian configuration®. Surprisingly, the results 
of their analysis place haramiyidans (includ- 
ing Arboroharamiya) and multituberculates 
together within Mammalia (Fig. 1a). This 
implies that mammals originated at least 
215 million years ago — a much earlier date 
than many palaeontologists would accept”, 
but one that is in agreement with a recent esti- 
mate based on a molecular-clock model’. 

Megaconus, described by Zhou and col- 
leagues, comes from the same geological for- 
mation as Arboroharamiya, but is slightly older. 
Apparently adapted for terrestrial life and a 
plant-based diet, Megaconus is primitive in 
many respects: the counterparts of the mam- 
malian middle-ear bones remain attached to 
the jaw, for example, and the ankle resembles 
that of pre-mammalian forms. The authors’ 
analysis places haramiyidans (including Mega- 
conus) outside Mammalia and unrelated to 
multituberculates (Fig. 1b). Under this inter- 
pretation, the minimal age for mammalian 
origin is much younger, and is constrained by 
the appearance of multiple, uncontested types 
of mammal by about the Middle Jurassic*’. 

The evolutionary history of various mam- 
malian groups has been characterized as ‘long 
fuse; ‘short fuse’ and ‘explosive’*”°, depend- 
ing on the time elapsed between their origin 
and significant diversification. Zheng and 
colleagues’ tree, which places the origin of 
mammals at 40 million to 50 million years 
before their Jurassic radiation, implies a long- 
fuse model for the group as a whole (Fig. 1a). 
Zhou and colleagues’ competing hypothesis, 
on the other hand, is consistent with an explo- 
sive model (Fig. 1b). These alternative family 
trees and implied models of diversification 
have profound ramifications for interpreting 
the significance of key adaptations, ecologi- 
cal interactions and other events in the early 
history of mammals**"". 

Neither family tree perfectly explains all the 
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Figure 1 | Alternative interpretations of early mammalian history. Discoveries identified as 
uncontested mammals appear in the fossil record by the beginning of the Middle Jurassic (175 million years 
ago). Close relatives have been found in rocks dating to the Late Triassic (about 215 million years ago), 

but most palaeontologists place these groups outside Mammalia’. Zhou et al.” and Zheng et al.’ describe 
fossils that are both assigned to one such group, the Haramiyida. Although the fossils are relatives, the two 
teams reach different conclusions as to where haramiyidans fit in the scheme of mammalian relationships, 
with strikingly different implications for the origin and diversification of mammals. a, Zheng et al. nest 
haramiyidans within Mammalia, implying a Late Triassic origin for mammals and a long-fuse model” 

of mammal diversification. (Early mammalian splits in a are extrapolated from the first appearance of 
haramiyidans* and a marsupial—placental split on the basis of the age of the extinct genus Juramaia”’). 

b, By contrast, Zhou et al. place haramiyidans outside Mammalia, suggesting that the initial radiation of 
mammals occurred explosively during the Middle Jurassic, immediately following their origin. (Sequential 
mammalian splits in b are based on the ages of the extinct genus Henosferus’, the earliest multituberculates* 
and Juramaia”’, respectively). Skull reconstructions from refs 4 and 6. 


data, of course. Indeed, each implies seemingly 
unlikely examples of independent evolution 
or reversal (homoplasy) of key characteristics. 
Zheng and colleagues’ model, for instance, 
suggests that the complex, three-boned mid- 
dle ear, long regarded as a diagnostic feature 
of mammals’, may have arisen independently 
at least three times — in the multitubercu- 
late-haramiyidan group, in monotremes 
(platypuses and echidnas) and in therians 
(marsupials and placentals). Similarly, the tree 
proposed by Zhou et al. requires independent 
evolution of strikingly similar tooth features — 
the presence of a single pair of enlarged, for- 
ward-facing incisors and complex cheek teeth 
with multiple rows of cusps, for example, that 
are presumably associated with herbivory — in 
haramiyidans and multituberculates. 

These contrasting results, and their implica- 
tions for mammalian evolution, should be con- 
sidered in light of the underlying data. Of the 
two fossils, Megaconus is the more complete, 
with 41% of anatomical characters scored, 
compared with 23% for Arboroharamiya. In 
addition, the analysis by Zhou et al. is more 
comprehensive in its sampling of species and 
anatomy, particularly with respect to harami- 
yidans and multituberculates. 

The obvious next step, of course, is to con- 
duct analyses to synthesize and reconcile the 
data presented in these two contributions. 


Ultimately, however, more and better fossils, 
ideally including skulls, will be needed to 
refine knowledge of early mammalian radia- 
tions, both in terms of relationships and 
palaeobiology. m 
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ARTICLE 


doi:10.1038/nature12429 


A Jurassic mammaliaform and the earliest 
mammalian evolutionary adaptations 


Chang-Fu Zhou’, Shaoyuan Wu'+, Thomas Martin? & Zhe-Xi Luo”? 


The earliest evolution of mammals and origins of mammalian features can be traced to the mammaliaforms of the Triassic 
and Jurassic periods that are extinct relatives to living mammals. Here we describe a new fossil from the Middle Jurassic 
that has a mandibular middle ear, a gradational transition of thoracolumbar vertebrae and primitive ankle features, but 
highly derived molars with a high crown and multiple roots that are partially fused. The upper molars have longitudinal 
cusp rows that occlude alternately with those of the lower molars. This specialization for masticating plants indicates 
that herbivory evolved among mammaliaforms, before the rise of crown mammals. The new species shares the 
distinctive dental features of the eleutherodontid clade, previously represented only by isolated teeth despite its 
extensive geographic distribution during the Jurassic. This eleutherodontid was terrestrial and had ambulatory gaits, 
analogous to extant terrestrial mammals such as armadillos or rock hyrax. Its fur corroborates that mammalian 
integument had originated well before the common ancestor of living mammals. 


Clade Mammaliaformes' 
Order Haramiyida 
Family Eleutherodontidae** 
Megaconus mammaliaformis gen. et sp. nov. 


Etymology. Mega (Greek): large; conus (Latin): cusp, after the single 
hypertrophied anterior cusp of lower premolars; mammaliaformis: in 
reference to ancestral mammalian features. 

Holotype. Paleontological Museum of Liaoning at Shenyang Normal 
University (PMOL) AM00007A and AM00007B (Figs 1, 2 and Sup- 
plementary Figs 1-3), preserved with in situ dentition, mandibles and 
associated mandibular middle ear, and most of the postcranial skeleton. 
Life Science Identifier (LSID): urn:lsid:zoobank.org:act:852BDC75- 
5D40-4486-990F-2F3682B40FF6. 

Locality and age. The Daohugou site in the Tiaojishan Formation of 
Inner Mongolia, China; the locality was directly dated to be 165-164 
million years (Myr) old*°. The mammaliaform assemblage from 
Daohugou includes the docodont Castorocauda’, the triconodontid 
Volaticotherium® and a pseudo-tribosphenic mammal’. The Tiaojishan 
Formation also yielded a eutherian’® from another locality, at a higher 
stratigraphic level, which was separately dated to be 161 + 1.44 Myr”. 
Diagnosis. P-C°-P?-M?/1,-Cy-Po-M3 (Fig. 2), with a procumbent, 
enlarged lower incisor I, and rhomboidal upper incisor I’; each lower 
premolar has an enlarged anterior cusp, and a talonid heel consisting 
of two cusp rows: P, anterior cusp is hypertrophied and re-curved; Py 
talonid is basined. Molars are hypsodont (‘high-crowned’) by roots’’, 
and each has multiple roots that are fused proximally but divided 
distally. Lower molars have two longitudinal rows, each of multiple 
cusps mostly of pyramidal shape, plus a labial cingulid. Upper molar 
M has three rows, each of four cusps; M? has full labial and middle 
rows, plus a short lingual row of only two cusps and restricted ante- 
riorly, forming the anterolingual wing. M’ has only two rows, of 
which the lingual row is longer. The lingual cusp row on M'-M? is 
lingually offset from the M° (Fig. 2 and Supplementary Fig. 2). Both 
upper and lower molars are interlocked either by an embayment on 
the cingulum to receive the preceding tooth, or by tongue-in-groove 


interlock on the hypsodont roots. Megaconus mammaliaformis is 
identical to Eleutherodon’ and Sineleutherus'*"* in terms of highly 
distinctive hypsodont and proximally fused multiple roots, and inter- 
locking features between adjacent molars (Fig. 2 and Supplementary 
Figs 4, 5). Megaconus is different from other eleutherodontids in lack- 
ing the dense flutings on the tooth crown, and is distinguishable from 
all mammaliaforms and cynodonts of the Jurassic and Cretaceous 
periods by a combination of autapomorphic and plesiomorphic skull 
and postcranial features, and from all clades of crown mammals by 
many characteristics (Figs 1 and 3; differential diagnosis in Supplemen- 
tary Information). 


Phylogenetic relationship 


Placement of haramiyidans (including Megaconus and other eleuthero- 
dontids) is broadly relevant to the earliest evolution of mammalian 
features and timing of origin of basal clades of mammals. The complete 
dentition, mandible and partial skeleton of Megaconus here provide 
new characteristics for testing alternative phylogenetic hypotheses of 
haramiyidans, previously based on teeth and their occlusal features**!*””. 
According to our analysis of a matrix of 475 characters and 110 clades 
(considerably expanded from refs 10, 18), including all observed fea- 
tures in Megaconus, and expanded sampling of relevant eleuthero- 
dontids, haramiyidans and multituberculates, Megaconus is nested 
within haramiyidans among mammaliaforms, outside of crown Mam- 
malia, and is separated from multituberculates nested in the crown 
Mammalia (Fig. 4 and Supplementary Fig. 9). The hypothesis that 
haramiyidans (including Megaconus) are a mammaliaform clade out- 
side of the crown Mammalia is the most parsimonious given our data 
set (Supplementary Information). By nonparametric test, the best strict 
consensus tree (Supplementary Fig. 9) is 16 steps shorter than, and 
significantly different (P = 0.0389) from, the suboptimal “allotherian’ 
hypothesis that would group haramiyidans with multituberculates in 
one clade’®?”?°, 

This placement of Megaconus has resulted primarily from mandibular 
and postcranial features. Megaconus is more plesiomorphic than living 
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Figure 1 | New Jurassic mammaliaform Megaconus mammaliaformis. 

a, Skeletal reconstruction. b, Holotype counterpart (Paleontological Museum of 
Liaoning (PMOL)-AM00007B). ¢, Skeletal feature identification; the left (-1) 
versus right (-r) sides are designated according to the main-part PMOL- 
AMO00007A (Supplementary Fig. 1). d, Manual terminal phalanges. Details on 


mammals in having the postdentary trough to hold the mandibular 
middle ear (Fig. 2 and Supplementary Figs 2 and 3), but similar to most 
mammaliaforms ranging from Sinoconodon to docodontans'*'~**, and 
to Haramiyavia, in having the postdentary trough”. It differs from 
multituberculates in that the latter group has the middle ear fully 
separated from the mandible”. The mandible of Megaconus has a 
distinctive angular process in contrast to the rounded angular region 
of multituberculates. At the ankle, the calcaneus is positioned side by 
side with the astragalus, and lacks the long calcaneal tuber and laterally 
compressed calcaneal body, in contrast to multituberculates that have 
partial superposition of the astragalus over the compressed calcaneus 
with an elongate calcaneal tuber (Fig. 3). The astragalotibial joint of 
Megaconus lacks all of the derived features of multituberculates, such 
as separate medial versus lateral tibial facets to articulate the medial 
versus lateral distal tibial condyles for a highly mobile and flexible 
ankle of multituberculates”’. 


Implications for dental evolution of early mammals 


We note that eleutherodontids have interlocking features between 
adjacent molars (Supplementary Figs 4 and 5), as in many other 
(although not all) Triassic-Jurassic mammaliaforms and many crown 
mammals. Mammals are characterized by only one generation of 
tooth replacement of antemolar teeth, and their molars usually have 
fully divided roots and precise upper-to-lower crown occlusion, in 
contrast to the teeth of most pre-mammalian cynodonts (except for 
traversodontids and tritylodontids**), which have multiple replace- 
ments, lack full root division and lack precise occlusion. Fewer repla- 
cement(s) at a locus and slower ontogeny of individual teeth are 
crucial for mammalian teeth to achieve precise upper-to-lower occlu- 
sion, the basis for feeding specialization by distinctive and diverse 
molar crowns’, and by different growth patterns’’. These are funda- 
mental for evolutionary diversification of major clades. Megaconus 


164 | NATURE | VOL 500 | 8 AUGUST 2013 


dental and skeletal structures can be found in Supplementary Figures. 

C, cervicals; Ca, caudal vertebrae; CMME, preserved elements of cynodont 
mandibular middle ear**; D, dorsal vertebrae (D1-15 designated as ‘thoracic’; 
D16-24 as ‘lumbar’); r, ribs; S, sacral vertebrae; 3-5, the preserved manual 
terminal phalanges 3-5. 


adds to a growing body of evidence that mammaliaforms developed 
interlocking mechanisms between adjacent teeth for precise align- 
ment of teeth in a tooth row, most probably correlated with the 
mammal-like and slower ontogeny of a tooth and slower replacement 
than those of most cynodonts’. 

Teeth of living mammals can develop high enamel crowns, or a 
high dentine wall above the divided roots, as a consequence of varying 
growth rates and for specialized feeding adaptation’. Most mam- 
maliaforms, such as the haramiyid Thomasia, have brachydont 
(low-crowned) molars with fully divided roots’*. By contrast, eleu- 
therodontids (Fig. 2, Supplementary Figs 3-5 and Supplementary 
Video 1) have hypsodont and proximally fused multiple roots for 
molars, also known as dentine or root hypsodonty, convergent to 
the dentine hypsodont teeth of some eutherians, among diverse types 
of hypsodont teeth of living therians’’. Teeth erupt at a slower rate for 
a longer duration in hypsodont teeth in general, and the root growth 
phase of tooth ontogeny is longer for dentine hypsodont molars in 
particular'’. Thus dentine hypsodont molars of Megaconus indicate 
that mammaliaforms have a wide range of tooth growth patterns, 
some of which are convergent to phylogenetically unrelated taxa 
within living mammals. 

Several cynodonts and many mammals have molariform teeth with 
(usually three) longitudinal, multicusped rows of the upper teeth 
alternately occluding with (usually two) multicusped rows of the 
lowers (Fig. 4), with backward power stroke of the mandible for 
specialized feeding on plants'®”*°°. Similar (although not identical) 
dental patterns evolved in traversodontids and tritylodontids among 
pre-mammalian cynodonts”* (Fig. 4), in haramiyidans'’*””, in multi- 
tuberculates*”, and prominently in rodent placental mammals”. 

Ever since the discovery of the haramiyidans of the Late Triassic, it 
has been debated whether haramiyidans and multituberculates 
belong to a clade**’*'”*! or whether they are separate lineages’*. 
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Figure 2 | Dental, mandibular and ear structures of Megaconus and reconstructed mandibular middle ear. g, Labial view reconstruction of 


comparative taxa. a, b, Occlusal and lingual views of upper P?-M? (stereo mandible with middle ear. h, Multituberculate Lambdopsalis. i, Mandible and 
images from computed tomography scan, after correction for the fault through — middle ear of the mammaliaform Sinoconodon (h, i from ref. 24). CMME, 


M’; Supplementary Fig. 2 and Supplementary Video 1). ¢, Lingual view cynodont mandibular middle ear (ref. 24); DMME, definitive mammalian 
reconstruction of I’ (alveolus only) and P’-M®. d, Occlusal view of P'-M?. middle ear. Yellow, ossified Meckel’s element (‘postdentary rod’); 
e, Occlusal view of lower P|-M; (Supplementary Video 1). f, Lingual view of _ green, malleus (articular) and surangular; blue, ectotympanic (angular); red, 
lower teeth (partially fused hypsodont roots in shading), dentary and incus (quadrate). Details in Supplementary Figs 2-5. 

a e@ Megaconus (ambulatory) Figure 3 | Comparison of the hindlimb and pes 


of Megaconus and other mammaliaforms. 

a-d, Fused tibio-fibulas of obligatory terrestrial 
mammals and Megaconus: hedgehog Erinaceus 
(ambulatory terrestrial) (a); hyrax Procavia 
(cursorial terrestrial) (b); aardvark Orycteropus 
(ambulatory terrestrial) (c); armadillo Dasypus 
(ambulatory, terrestrial) (d). a-d from ref. 37. 

e, Megaconus, with fused tibio-fibula (posterior 
Extratarsal spur view; fused distal tibio—-fibula best seen in 
Supplementary Fig. 8) and partially disarticulated 
pes (proximal tarsals in dorsal view). 

f, g, Mediolaterally compressed calcaneal tuber in 
the multituberculates Eucosmodon (dorsal view, 
from ref. 39) and Sinobaatar (ventral view). 

h, Eutriconodont Jeholodens (dorsal and ventral 
views (refs 39, 40)). i, Mammaliaform 
Morganucodon (top, dorsal view; bottom, ventral 
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Figure 4 | Cynodont-mammal transition and evolution of mammal-like 
postcanines with multicusped rows that occlude alternately between uppers 
and lowers, for omnivory-herbivory feeding adaptations. Eleutherodontids 
differ from multituberculates in molar counts, and in tooth positions of lingual 
offset tooth row pattern. Megaconus is another case of fur preserved in 
mammaliaforms, in addition to the case of Castorocauda, corroborating that 
the origins of fur occurred before the origin of crown Mammalia. 


The former hypothesis would posit that such complex molariforms 
evolved, only once, in a combined haramiyidan—multituberculate 
clade, whereas the latter phylogeny would lead to a scenario of such 
molariforms evolving twice, first with haramiyidans during the 
Triassic, and then in multituberculates during the Jurassic. After 
incorporating numerous features newly revealed by Megaconus, in 
addition to the classic dental features analysed in detail by previous 
studies'®**, our phylogeny (Fig. 4 and Supplementary Fig. 9) shows 
that haramiyidans (including Megaconus) are a mammaliaform clade 
outside of crown Mammalia, separated from multituberculates, which 
are deeply nested within Mammalia. The hypothesis that suggests that 
haramiyidans are stem mammaliaforms (Supplementary Fig. 9) is sig- 
nificantly different from the haramiyidan—multituberculate hypothesis 
by nonparametric test on our matrix (P = 0.00389). A key implication is 
that the ‘multituberculate-like’ molar pattern evolved in harami- 
yidans in convergence to multituberculates (Fig. 4). 

Although molars of haramiyidans and multituberculates show 
similar alternating occlusion of multicusped rows between the uppers 
and the lowers in backward power stroke of the mandible’’, we note 
several obvious differences between these groups: (1) the lower pre- 
molars of eleutherodontids (Fig. 2 and Supplementary Fig. 4) have a 
singular, enlarged and re-curved anterior cusp and a talonid heel that 
has two short cuspule rows (P}) or a basin (Pz). These are fundament- 
ally different from the multituberculate lower premolars with crown- 
length blade of multiple serrations with fine ridges on the sides of the 
blade. Lower premolars of eleutherodontids are capable of puncturing, 
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in contrast to the bladed premolars of multituberculates specialized for 
shearing”; (2) basal multituberculates have a full premolar count of 
five but only two molars’, whereas eleutherodontids have only two 
premolars but three molars (Fig. 4); (3) the lingual offset of the last 
upper molar is the most important synapomorphy of all multituber- 
culates: the lingual cusp row of M7 is offset from the lingual row of M’ 
(refs 3, 16, 30). In Megaconus, however, it is M! and M? that are lin- 
gually offset from the last molar M’ (Fig. 4). The intact dentition scanned 
by computed tomography (Supplementary Figs 2 and 3) proves that 
the multiple rows alternately occluding between the upper and the lower 
molars are developed in different tooth positions in eleutherodontids, 
as compared to multituberculates; (4) eleutherodontids have multiple, 
proximally fused hypsodont roots, known only in Cretaceous gond- 
wanatherians among Mesozoic era mammals’. No multituberculates 
have this root pattern; (5) Megaconus lacks the anterior extension of 
the masseteric fossa onto the body of the mandible, as seen in multi- 
tuberculates for the posteriorly directed power-stroke”. Megaconus is 
similar to Haramiyavia in this plesiomorphic feature’’, and probably 
had a much weaker posterior power-stroke of the mandible. 


Skeletal features and habits 


Megaconus has a gracile postcranial skeleton (Fig. 1 and Supplemen- 
tary Fig. 1) and is estimated to have weighed about 250 g (Supplemen- 
tary Information). The vertebral column of Megaconus has 24 dorsal 
vertebrae, several more than those of most Mesozoic mammals 
(except some eutriconodonts'*** and most living mammals). This 
high dorsal vertebra count is comparable to hyracoids, perissodactyls 
and some xenarthrans among extant placentals*. Megaconus and 
multituberculates differ conspicuously in vertebral features: the thor- 
acolumbar boundary is distinctive and positioned at thoracic 13, and 
the anticlinal is positioned at thoracic (dorsal) 10 in multitubercu- 
lates**. In Megaconus, the thoracolumbar transition is gradational 
from dorsal vertebrae D16 to D20, the anticlinal vertebra is positioned 
at D21 (Fig. 1). The anticlinal marks the boundary of the posteriorly 
inclined neural spines of anterior dorsals, and the anteriorly inclined 
neural spines of posterior dorsals, corresponding to the regions of the 
anterior and posterior epaxial muscles. The difference of the anticlinal 
at D10 in multituberculates versus D21 in Megaconus indicates that 
Megaconus had more vertebral segments and a much longer anterior 
epaxial skeleton—-muscular region than multituberculates. 

Megaconus is more similar to the Triassic cynodont from the 
Manda Formation and Morganucodon**** than to multituberculates 
and therian mammals in the calcaneus and astragalus (Fig. 3). The 
calcaneus has an extensive peroneal shelf, a massive sustentacular 
region, but a short and ventrally directed calcaneal tuber continuous 
with the calcaneal body, all of which are similar to cynodonts (Fig. 3j, k)** 
but different from mammals including multituberculates, in which 
the tuber is distinctive from the main body of the calcaneus (Fig. 3f-i). 
The astragalus is plesiomorphically similar to cynodonts in the absence 
of the well-defined navicular facet of most mammaliaforms’’. An 
enlarged extratarsal spur, consisting of bony base (os calcares) and 
horny (keratin) spur (cornu calcares), shows that Megaconus had a 
poisonous spur similar to the living monotremes, now also known 
from docodont mammaliaforms’. 

The tibia and fibula are fused in Megaconus (Fig. 3e). The two bones 
are fused at both the proximal and the distal ends, as can be verified on 
both the slab and the counter-slab (Fig. 1 and Supplementary Fig. 1), 
but not along shafts. Fused tibio-fibula is a common feature of ter- 
restrial mammals (except for the arboreal tarsier)*’. Cursorial or bipe- 
dal jumping (terrestrial) mammals commonly show tibio-fibula fusion 
along much of the shafts in addition to a great reduction in the fibular 
part*®. But mammals with ambulatory (walking) gaits tend to retain an 
unreduced fibula fully separated along shafts from the tibia. The tibio- 
fibular fusion occurs only at the proximal and distal ends in Mega- 
conus as in armadillos (for example, Dasypus) that are terrestrial with 
ambulatory gaits. The tibio-fibula also bears resemblance to the ambulatory 
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aardvark (Orycteropus) and cursorial hyraxes (Procavia and Heter- 
ohyrax) that have the tibia and fibula fused only at the proximal ends, 
but remain separated at the distal ends. Megaconus lacks an elongate 
calcaneal tuber, which is an important in-lever for effective action of 
the Achilles tendon muscles on the ankle, as seen in multituberculates 
through crown Theria. The hypothesis stating that Megaconus is a 
terrestrial mammaliaform is consistent with the fact that both the manual 
and pedal terminal phalanges are generalized, with lateral ridges, but 
without the high and arched dorsal profile as in climbing mammals, or 
hypertrophied as in digging mammals. The claw keratin sheath pre- 
served as an impression with three claws has a wide outline but only 
slight dorsoventral curvature (Fig. 1c, 3-5), typical of extant terrestrial 
mammals. 

The Megaconus skeleton is preserved with the guard hairs as an 
impression in most areas of the halo around the skeleton that can be 
distinguished by the lighter colour of under fur residues. The abdomen 
appears to have areas of sparse hairs and naked skin with small skin 
foldings. Thus Megaconus probably had a naked abdomen, although it 
is not possible to speculate whether a pouch would have been present. 
Furs and other integumentary features (for example, poisonous spur) 
in eleutherodontids as a mammaliaform clade corroborate the wide- 
spread presence of mammalian integument among mammaliaforms, 
previously known only from a docodontan’, and that mammalian fur 
originated earlier in evolutionary history than the ancestor of extant 
mammals (Fig. 4). 
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Connectomic reconstruction of the inner 
plexiform layer in the mouse retina 


Moritz Helmstaedter'}, Kevin L. Briggman't, Srinivas C. Turaga’+, Viren Jain’+, H. Sebastian Seung” & Winfried Denk’ 


Comprehensive high-resolution structural maps are central to functional exploration and understanding in biology. For the 
nervous system, in which high resolution and large spatial extent are both needed, such maps are scarce as they challenge 
data acquisition and analysis capabilities. Here we present for the mouse inner plexiform layer—the main computational 
neuropil region in the mammalian retina—the dense reconstruction of 950 neurons and their mutual contacts. This 
was achieved by applying a combination of crowd-sourced manual annotation and machine-learning-based volume 
segmentation to serial block-face electron microscopy data. We characterize a new type of retinal bipolar interneuron 
and show that we can subdivide a known type based on connectivity. Circuit motifs that emerge from our data indicate a 
functional mechanism for a known cellular response in a ganglion cell that detects localized motion, and predict that 


another ganglion cell is motion sensitive. 


Information about neuronal wiring has long been the basis of formu- 
lating and testing ideas about how computation is performed by neural 
circuits. Complete'” and partial’ wiring diagrams are being used where 
available. Whether such diagrams can be created by statistical extrapo- 
lation or whether higher-order connectivity is functionally important is 
highly controversial®°. The assumption that mingling neurites connect 
(Peters’ rule’®) allows connectivity to be inferred from light-microscopic 
observations of sparsely stained tissue, but is frequently violated®’, show- 
ing that connectivity must be explicitly tested rather than inferred from 
proximity. Simultaneous electrical recordings from several cells can 
determine and quantify their synaptic connectivity'’"’, but do not allow 
a comprehensive sampling of connections. 

Unlike light microscopy, electron microscopy can follow even the thin- 
nest neurites through densely stained neuropil, and can detect unambi- 
guously whether two cells touch and over which area’. Serial section 
transmission electron microscopy was, for example, used to reconstruct 
the complete wiring diagram of the roundworm Caenorhabditis elegans'” 
and to study synaptic connectivity in the retina'*'’. Volume electron 
microscopy data sets hundreds of micrometres in extent’* have been used 
to reconstruct—guided by previous functional imaging—specific neural 
circuits>"®. 

The retina performs a variety of image-processing tasks and is one 
of the best studied parts of the central nervous system”. But, only in 
few cases, such as for direction sensitivity (reviewed in ref. 20), has the 
underlying neural computation been plausibly explained, combining 
information from anatomical studies, electrical recordings and two- 
photon calcium imaging*’*”?, 

Here we combined serial block-face electron microscopy (SBEM)* 
data, crowd-sourced manual annotation”, machine learning-based 
boundary detection**”’, and automatic volume segmentation to recon- 
struct the neurites of 950 neurons in a 114m X 80 pm area of the 
inner plexiform layer (IPL) and all their contacts in that volume. 

We establish the validity of the reconstruction using known circuits 
and demonstrate its use by classifying cells based on their electron- 
microscopy-resolution morphology, by isolating a new type of bipolar 
cell, by showing that the cell-to-type contact area is in some cases 


tightly controlled and can be used to augment type classification, and 
by uncovering several cases in which neurite co-stratification does not 
predict a contact. Among the functional implications of these findings 
is the prediction that a particular ganglion cell is motion sensitive. 


Imaging and reconstruction 


We used SBEM because a superior z-resolution’* and lack of image 
distortions makes SBEM data sets more easily traced by humans’ and 
segmented by computers. The main data set used in this study (e2006) 
has a volume of more than 1 million um?, includes all layers that con- 
tain intra-retinal synaptic connections, and was stained to enhance 
plasma-membrane visibility’, further facilitating traceability and 
automated segmentation. 

Because completely labelling such a volume by hand would be 
prohibitively expensive (about US$10 million), we tried to establish 
an entirely automatic reconstruction pipeline. Our SBEM data can be 
automatically segmented into objects that represent the local cellular 
geometry acurately*®’’. But even at voxel error rates of a few per cent, 
cells get fragmented into many pieces (M.H. et al., manuscript in 
preparation). Manually traced skeletons, on the other hand, do reli- 
ably establish intra-cellular continuity over large distances” but allow 
neither identification nor quantification of cell-cell contacts, and 
visually inspecting close skeleton encounters® is impractical for the 
large number (roughly 10°) expected in our data set. Therefore, we 
separately created skeletons for all cells by crowd-sourced manual 
annotation, which is much faster than manual volume tracing”, 
and volume segmentation (see below). 

Skeletons were created by a team of trained human annotators, 
which included, over time, more than 224 different students. First, the 
annotators identified all somata and classified them as photoreceptor 
(n> 2,000), glial (n = 173), horizontal (n = 33), bipolar (n = 496), ama- 
crine (n = 407) or ganglion (n = 47) cells, based on soma location and 
emerging neurites (Fig. la). Starting from the somata, the annotators 
skeletonized the neurites of all glial, bipolar (Fig. 1b), amacrine and 
ganglion cells using the KNOSSOS program” (http://www.knossostool. 
org). Multiple tracings by different annotators (average redundancies: 6, 


1Max-Planck Institute for Medical Research, D-69120 Heidelberg, Germany. *Department of Brain and Cognitive Sciences, Howard Hughes Medical Institute, Massachusetts Institute of Technology, 
Cambridge, Massachusetts 02139, USA. Present addresses: Max-Planck Institute of Neurobiology, D-82152 Martinsried, Germany (M.H.); National Institute of Neurological Disorders and Stroke, National 
Institutes of Health, Bethesda, Maryland 20892, USA (K.L.B.); Gatsby Computational Neuroscience Unit, London WC1N 3AR, UK (S.C.T.); Howard Hughes Medical Institute, Janelia Farm Research Campus, 


Ashburn, Virginia 20147, USA (V.J.). 
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4 and 4 for ganglion, amacrine and bipolar cells, respectively), were 
automatically consolidated”, visually inspected and, in a few cases, 
manually corrected. A total of >20,000 annotator hours yielded 2.6m 
of skeletons, representing 0.64 m of neurite, with estimated” error rates 
of 9, 12 and 6 per ganglion, amacrine and bipolar cell, respectively. 


Cell types 

We classified all neurons into cell types by visual inspection of the bare 
skeletons, with a focus on the IPL. We found n = 459 almost complete 
bipolar cells (Fig. 1c; all reconstructed types and cells are shown in 
Supplementary Data 1 and 6, respectively). Most bipolar cells clearly 
belonged to one of the 10 types described previously** (Fig. 1c). 
However, particularly for OFF cone bipolar cells (CBCs) (1-4), some 
classification ambiguity remained, even after taking into account 
tiling. A random re-examination of 59 ON CBCs (CBC5-9) found 
one error. 

Seven cells showed no similarity to any of the ten bipolar types”®, but 
shared a distinct morphology and were designated as XBCs (Fig. 1d 
and Supplementary Data 2a). XBC axons stratify more narrowly but at 
the same average depth as CBC5 (Fig. 1d, e). Laterally, XBC axons 
roam widely, similar to CBC9, but their dendrites are comparatively 
compact, different from CBC9 (Supplementary data 2b), and their 
depth suggests that they contact cones. 

The dendrites of all ganglion cells and of many amacrine cells 
extended beyond the data set volume. Many ganglion and amacrine 
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Figure 1 | Raw data, skeletons and bipolar cell analysis. a, Somata, from the 
left: photoreceptor (grey), horizontal (green), bipolar (red), glia (yellow), 
amacrine (blue) and ganglion (grey) cells. Also shown (white) are axons for two 
CBC1, one CBC6 and two CBC7 cells. GCL, ganglion cell layer; INL, inner 
nuclear layer; OPL, outer plexiform layer; PRL, photoreceptor layer. b, Side 
views from two orthogonal directions onto a single CBC4 skeleton (top), and 
light-axis (l.a.) views of dendrite (left) and axon (bottom). c, One example for 
each bipolar cell type. d, All XBC skeletons, side view. e, Skeleton density 
(segment length/vote count, normalized across IPL) versus depth for all bipolar 
cell types (one profile shown for the entire CBCS5 population). Inset: bipolar cell 
prevalence (colours as for depth profiles). Scale bars, 10 um. 
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cells could nevertheless be grouped by inspecting their neurites (12 
ganglion cell types, Fig. 2a, b; 12 narrow-field amacrine cell types, 
Fig. 2c, d; 33 medium/wide-field amacrine cell types, including 6 
displaced types, Fig. 2e, fand Supplementary Data 1 and 6). We used 
the type-averaged (for individual variations see Supplementary Data 1) 
neurite density over depth in the IPL (Fig. 2) to create for all amacrine 
and ganglion cell types unique identifiers (ac64-73, for example, is an 
amacrine cell type with first and third quartiles at 64% and 73% IPL 
depth, respectively). Prominent among cell types previously known 
(see Supplementary Data 7 for a complete listing) are gc30-63, ac25- 
31 and ac60-65, corresponding to ON/OFF direction-selective gan- 
glion cells”? (DSGCs; Fig. 2a) and ON and OFF starburst amacrine 
cells'® (SACs; Fig. 2e), respectively. 


Contact detection 


We next combined the skeletons with an automatic segmentation 
(Fig. 3), created by first training a convolutional network to detect cell 
boundaries”, followed by several growth and merge steps (Fig. 3a). The 
final volume consolidation into a representation of the cellular geo- 
metry was performed by combining for each cell all segments overlap- 
ping its skeleton (Fig. 3b, typically several hundred segments; total 
estimated volume error rate about 3%, see Methods). 

Of 1,123 fully volume-reconstructed cells, 173 were glia, 110 were 
orphans (one-of-a-kind cells or cells without a reasonable neurite 
morphology), and 840 were the neurons used in the analysis. All con- 
tacts (n = 579,724) between them were automatically detected and 
quantified (Fig. 3c, Supplementary Data 5 and Methods). When testing 
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Figure 2 | Ganglion and amacrine cells. a, Normalized ganglion cell depth 
profiles for gc31-56, gc36-51 (W3), gc30-63 (DSGC) and the remaining cell 
types (grey). b, All three gc31-56 cells (somata: grey disks, side (top) and light- 
axis (bottom) views), and all other inner nuclear layer and ganglion cell layer 
(somata: black dots, side view only). ¢, Narrow-field amacrine cell depth 
profiles for ac21-67, ac52-90 (A2) and remaining narrow-field amacrine cells 
(grey). d, One example each for ac21-67 and ac52-90 (A2). e, Medium-field 
amacrine depth profiles for ac19-30, ac25-31 (OFF SAC), ac35-41, ac60-65 
(ON SAC), ac34-84 (A17) and remaining medium-field amacrine cells (grey). 
f, Light-axis view for ac19-30. Scale bars, 10 jum. 
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Figure 3 | Automatic segmentation and contact 
detection. a, From left: raw data (offset and 
contrast adjusted), edge classifier (xyz average), 
initial and iterated segmentations (see also 
Supplementary Data 3e-g). b, From top: bare 
skeleton, skeleton with overlapping segmentation 
objects, and the resulting volume representation 
for a CBC6 axon. c, Automatically detected contact 
(red arrow) between a CBC5 cell and a DSGC 
(gc30-63). d, Cross-sections through a non- 
synaptic contact (left, Supplementary Data, 3a, b) 
and a ribbon synapse (right) from data set k563, 
coloured by hand. e, Frequencies (bottom) of non- 


the reliability of the algorithm, we found that it missed none of 16 
contacts visually identified in the raw data, and 20 randomly selected 
algorithm-generated contacts contained only one false contact (caused 
by debris in one image). 

The cell-to-cell contact-area matrix (Fig. 4a and Supplementary 
Data 4) includes only contacts that are individually below 5m? 
(about 99.9% of all contacts), thus excluding touching somata and 
neurite bundles, and was then condensed into a type-to-type matrix 
(Fig. 4b and Supplementary Data 4). When exploring the circuit that 
couples rod photoreceptor signals into the cone pathway”, we found 
the A2 amacrine cell (ac52-90) contacting the rod bipolar cell (RBC) 
very strongly (with 23.6% of the A2 cell total detected neuronal con- 
tact area), contacting OFF CBCs quite well (6.4%, 4.3%, 1.8%, 3.2% 
and 2.9%, for types 1, 2, 3A, 3B and 4, respectively), and contacting 
ON CBCs more weakly (mostly CBC6 (2.0%) and CBC7 (1.4%), but 
not the XBC (0.2%)). The RBC, 38.3% of whose contact area is with 
the A2 cell, also strongly contacts (with 13.5%) ac34-84 (also known 
as an Al7 amacrine cell)*’. 

Even when two cell types strongly contact each other (Fig. 4b), the 
contact area between each individual pair of cells, one from each type, 
varies widely (Fig. 4a). To test whether individual cells still form 
reliable channels of information, we compared how the total contact 
area that a cell of type A makes with all cells of type B varies among the 
cells of type A. For example, the contact areas between individual ON 
SACs and all cells of the CBC5R ‘type’ (9.9% on average, the most 
strongly contacted one among the CBCs) vary by only about 16% 
(s.d./mean; Fig. 4c). At the same time, the contact area between A2 
amacrine cells (ac52-90) and all RBCs, which is on average even 
stronger (24%), fluctuates more widely, by 25% (s.d./mean). 

To test how much information about the actual synaptic connecti- 
vity is provided by our contact-area measurements, we used the size 
distributions for synaptic and incidental contacts, measured in a data 
set (k563, ref. 5) with prominently stained synaptic vesicles and thick- 
enings (Fig. 3d, e and Supplementary Data 3a, b), to estimate, for all 
CBC-ganglion cell pairs, how many true synaptic contacts to expect for 
a given total contact area between two cells (Fig. 4d), and found that for 
a total contact area as small as 0.08 jum”, at least one synaptic contact 
exists with a probability of 50%, increasing to 95% for an area of 1 um”. 


Connectivity -based type classification 
We next explored whether comprehensive contact information con- 
tained in the cell-to-cell matrix can be used to discriminate between 
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synaptic (red, n = 63) and synaptic (green, n = 30) 
cell-cell contacts versus contact area and the 
Gaussian fits to them (thin lines, centre/width: 
0.18/0.38 and 0.22/1.13, all in jum”) and the 
resulting synapse probability (syn. prob.) estimate 
(top). Scale bars, 1 um (a), 500 nm (d). 
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otherwise very similar cell types. When we searched for a way to 
divide the CBC5s, which fall into two molecularly distinguishable 
classes in rats’ and are too numerous for a single class in mouse”, 
by using their connectivities to ganglion cells and amacrine cells, 
gc31-56 and gc36-51 emerged as potential discriminators (Fig. 5a). 
A reasonably complete tiling pattern resulted (Fig. 5b) when including 
only cells (n = 22) contacting gc31-56 more strongly than gc36-51 
(the exception was a single cell, which was near that threshold but was 
not included to avoid strong axonal overlap; asterisk in Fig. 5a). This 
group of cells, “CBC5A’, also shows a strong repulsion between their 
dendritic centroids (Fig. 5c), indicating a mosaic and hence a pure 
type”, and is specifically avoided by ac43-49 (Fig. 5a). The remaining 
37 cells (‘CBC5R’) still show strong axonal overlap, lack a mosaic 
(Fig. 5b, c), and are thus probably a mixture of types for which we 
did not, however, find a connectivity-based discriminator. The depth 
profiles of CBC5A (first and third quartiles: 54% and 61%) and 
CBC5R (50% and 59%; Fig. 5d) seem to be different. Ten cells did 
not overlap the dendrites of both ganglion cell types (Fig. 5b) and were 
therefore collected into a separate group (“CBC5X’). 


XBC circuits 

We next investigated how the XBC is integrated into the IPL circuitry 
(Fig. 6a—-c). Like RBC and CBC7, XBC devotes less of its contact area 
to ganglion cells than the average bipolar cell (Fig. 6a). XBC strongly 
contacts (Supplementary Data 2b) medium/wide-field amacrine cells 
ac38-56 (15.5%) and ac53-59 (7.1%), of which ac53-59 shares the 
XBC sharp depth profile (Fig. 6b) and, in turn, makes contact with 
gc31-56 (3.5%) and gc47-57 (4.2%). Those ganglion cells, however, 
receive only minimal amounts (0.9% and 0.4%) of their contacts 
directly from the XBC, even though their dendrites strongly overlap 
XBC axons in depth (Fig. 6b). Instead gc31-56 receives direct bipolar 
cell contacts mainly from CBC5A (7.0%) and gc47-57 from CBC5R 
(12.0%). ac38-56 is bistratified, overlapping in the ON stratum with 
the XBC and in the OFF stratum with gc35-41 (Fig. 6b, c), which is 
clearly an OFF cell (contacting CBCs 3A, 3B and 4, with 5.4%, 6.3% 
and 5.4%, respectively; all other CBCs are at most 0.5%) and receives 
10.0% of its contacts from ac38-56. 


ON/OFF ganglion cell circuits 
Some of the best studied ganglion cells respond to both ON and OFF 
stimuli. We therefore analysed the connection patterns onto several 
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Figure 4 | Contact matrices. a, Cell—cell contact-size matrix (see also 
Supplementary Data 4 and 5). Classes: ganglion cells (GCs), amacrine cells 
(ACs), bipolar cells (BPCs), from left to right. Ordering within classes: ganglion 
cells, types by depth (average of first and third quartile) in the IPL. Amacrine 
cells: narrow-field (nf), medium/wide-field (mf/wf) (including displaced). 
Within sub-classes: by depth, except bipolar cells (by numbering in ref. 28, XBC 
after 5X, and RBC last). Within types: random order. Dark lines along top and 
right side: every other type. Inset (bottom left): contact area-to-grey value (grey 
val.) mapping and prevalence (prev.). b, Type-type matrix, normalized along 
rows; along left edge: cells/type (note log scale); along bottom edge: median and 
depth range (between quartiles) for each type. c, Total contact area between 
each ON SAC (ac60-65, cell numbers along the x axis) and, respectively, all 
CBCs and RBCs, normalized to the total contact area of each SAC. d, The 
number of true synaptic contacts expected (Stochastic simulation, 1,000 runs 
per cell pair) for each actual CBC-to-ganglion-cell pair using the fits in Fig. 3e 
versus the total cell-cell contact area (median: green line, lower and upper 95% 
confidence levels, red lines). 


ganglion cells that ramify in both ON and OFF layers (Fig. 6d-f). 
Among those, gc36-51 (‘W3a’) and gc44-52 (‘W3b’) are consistent 
with cells labelled in the TYW3 mouse™. Either or both are likely to be 
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Figure 5 | CBC5 subtypes. a, CBC5s ordered by decreasing preference for 
gc31-56 over gc36-51 (black trace: contact area with all gc31-56s/sum of contact 
areas to gc31-56 and gc36-51). Also shown is the contact area with ac43-49 
(Aac43-49> green trace, relative to max). Dashed line denotes border between 
CBCS5A and CBCSR. Asterisk denotes CBC5 cell switched to CBCS5R to avoid 
mosaic violation. b, Light-axis views of CBC5A (5A; top) and CBCS5R (5R; 
bottom) axons. Red outline: region containing dendrites from both gc36-51 
and gc31-56; thin dashed line: data set border. Scale bars, 10 j1m. ¢, Variation of 
dendritic-centroid nearest-neighbour distances (NNgist) (standard deviation/ 
median) for: all CBC5s, only 5A, only 5R, the mixture of 6 and 7, only 6, only 7, 
and a set of 1,000 simulations randomly placing 22 points (error bar denotes 
fifth to ninety-fifth percentile). d, Normalized skeleton density depth profiles. 


homologous to what is called the ‘local edge detector’ in rabbit**”° 


(Fig. 6d). Their contact patterns with CBCs are mostly similar 
(gc36-51/gc44-52: CBCS5R, 7.5%/11.5%; CBC5A, 1.3%/0.8%; CBC4, 
3.0%/3.9%; CBC3A, 1.7%/1.8%; and CB3B, 3.2%/1.7%; Supplemen- 
tary Data 1), with the exception of the outermost part of the inner 
nuclear layer (INL) (CBC2, 1.5%/0.1%, and CBC1, 1.6%/0.1%). 
Substantial contacts are made by gc36-51 and gc44-52 with several 
narrow-field amacrine cells, ac52-90 (6.0%/2.8% (ref. 37), A2), ac21- 
67 (3.8%/2.1%), ac51-70 (3.5%/5.0%) and ac21-44 (3.3%/2.2%). The 
strongest amacrine cell contact made by gc36-51 is with ac43-49 
(6.8%), which straddles the boundary between ON and OFF layers 
(Supplementary Data 1), and also substantially contacts gc44-52 
(5.6%) as well as ON and OFF bipolar cells (CBC5R, 9.3%, and 
CBC4, 5.0%). ac43-49 is one of two medium/wide-field amacrine cells 
that dedicate most of their contacts to gc36-51 and gc44-52 (Sup- 
plementary Data 1). The second is ac44-54 (7.0%/6.2%), a cell domi- 
nated by ON CBCs (7.9% with CBC5R compared to 1.3%, 2.0% and 
1.2%, with CBCs 3A, 3B and 4, respectively). 

The ON/OFF DSGC (gc30-63, Fig. 6f), as expected**', strongly 
contacts SACs (9.2% and 11.4%, for ac25-31 (OFF SAC) and ac60- 
65 (ON SAC)), but substantial contacts from other medium/wide- 
field amacrine cells are conspicuously absent (ac34-84, 2.5%, all 
others < 1.6%). Like gc36-51/gc44-52 (W3a/b), the DSGC prefers 
CBC5R (6.9%) to CBC5A (1.9%, all other ON CBCs at most 1.1%). 
Its main OFF ‘input’ comes from CBC4 (3.2%) and CBC3 (A/B, 3.0%/ 
2.7%). SACs make most contacts (Fig. 6e) among themselves (26.6% 
and 21.4% for ON and OFF). They discriminate less than the DSGC 
between CBCS5R (9.7%) and CBCS5A (5.0%), but, most notably, contact 
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Figure 6 | Circuits originating from the XBC and ON/OFF cells. a-f, IPL 
circuitry from the XBC (a-c) and from three ON/OFF cells (d-f). a, Fractional 
contact areas between all ganglion cells and each bipolar cell. b, Depth profiles. 
c, XBC circuit schematic. d, One gc36-51/W3a (cell 16), one ac43-49 

(cell 307), one CBCS5R (cell 578) and all the detected contacts between 

(cyan spheres, volume proportional to contact area). e, Normalized contact 
areas for amacrine cells and bipolar cells with both SACs (ac25-31, blue; 
ac60-65, red). f, Circuit diagrams. Arrow width in circuit diagrams 
proportional to (total contact area between types)”°. Only connections with 
areas per type >30 tum? shown. 
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CBC7 (5.1%), which is largely ignored by the DSGC (1.1%, Fig. 6f). 
Similar differences are seen for the OFF sublamina: DSGC and SAC 
contact strengths to CBC1/CBC2 are 1.4%/0.5% and 4.7%/3.1%, 
respectively. 

Our last example is the analysis of a cell not associated with any 
known type in mouse but possibly homologous to a rabbit retina”® 
ON/OFF ganglion cell. gc31-56 is an ON/OFF cell by stratification 
(Fig. 2a, b), filling the space between the SAC bands (Fig. 2a, e), and 
‘connects’ strongly to both SACs (ac60-65, 5.4%, and ac25-31, 7.1%). 
Surprising is the strong imbalance between ON and OFF bipolar cell 
‘input’ (7.0%/3.7% for CBC5A/R, but only 0.8%, 0.7%, 0.9%, 1.5% and 
1.2%, for CBC1, 2, 3A, 3B and 4). 


Discussion 


Our comprehensive analysis of the bipolar cells confirmed the exist- 
ence of the ten bipolar cell types previously identified’*, and revealed 
the existence of the XBC, which had not emerged even in large genetic 
screens’. Although sharp stratification and large size (Fig. 1d, e and 
Supplementary Data 2a, also note the similarity to cluster 6 in ref. 40) 
suggest homology between the XBC and the giant bipolar cell des- 
cribed recently in the primate retina*', the small size of the XBC 
dendrites relative to its axonal arbour argues against it. The functional 
role of the XBC is unclear. Its sparseness suggests low spatial resolu- 
tion and its small dendritic fields suggest that it does not collect signals 
from all cells of one cone type, thus potentially forgoing some amount 
of signal. Curious is the absence of a bipolar cell with a similarly sharp 
stratification on the OFF side. Instead, we find an inter-layer connec- 
tion via the symmetrically bistratified ac38-56 (Fig. 6b, c). One might 
speculate that the XBC is part of a luminance adaptation pathway. 

Dense sampling and the complete high-resolution reconstruction 
of neurites, as is only possible with three-dimensional electron micro- 
scopy data, contributes in several ways to cell-type classification. First, 
when all cells of a class, for example, all bipolar cells, are recon- 
structed, no type will be missed and the prevalence of different types 
can be determined precisely (Fig. le, inset). Second, differences in 
neurite geometry can be compared for cells within the same piece 
of tissue. For almost all bipolar cells and a substantial fraction of 
ganglion and amacrine cells, it was thus possible to establish a cor- 
respondence to cell types described in the literature (Supplementary 
Data 7). We generally erred on the side of splitting groups and expect 
that some groups actually belong to the same type (for example, the 
similar connectivity to the XBC suggests that ac38-56 and ac37-52 
could be the same type; Supplementary Data. 4). Third, even if they 
cannot be selectively stained and imaged, tiling and mosaic formation 
(both used to assess purity of type*’) can be easily assessed (Figs 2b 
and 5b and Supplementary Data 1). Fourth, complete contact informa- 
tion can confirm or refine the definition of types (Fig. 5), and may 
ultimately become sufficient for classification all by itself”. 

Because of the constrained size of our data set, many amacrine cell 
and all ganglion cell neurites are truncated, and many larger neuron 
types are presumably completely missed***’. Advances in volume 
electron microscopy technology’* now make it possible to acquire 
volumes with a lateral extent of at least 500 um. One might then, using 
the same tools and a similar manual annotation effort as were used in 
our study, densely reconstruct a central region of 100 1m in extent 
and trace neurites of passage far enough into the periphery to deter- 
mine their cell type. 

Although our analysis provides contact areas and not synaptic 
strength, the absence of contact always indicates a lack of synaptic 
connection. The absence of contacts between some cell types, for 
example, XBC and gc35-41 as well as CBC7 and DSGC, the neurites 
of which mingle extensively, confirms that Peters’ rule'® is routinely 
violated. Furthermore, it seems that large contacts are quite likely to 
be synaptic (at least between bipolar cells and ganglion cells; Figs 3e 
and 4d). Although we have not used them here, other geometric 
parameters describing contact shape might provide enough additional 
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information to identify actual contacts with near certainty for many 
types of synapses. 

It has been our consistent experience that selectively enhancing 
cell-surface contrast’ simplifies manual tracing and enables automatic 
volume segmentation. If recent results that suggest that even conven- 
tionally stained tissue can be reliably traced by hand (K.L.B. and M.H., 
unpublished observations) and automatically segmented (M. Berning 
and M.H., personal communication) are confirmed it may no longer be 
necessary to trade traceability for synapse identification. 

The reliability of the entries in the contact matrix depends on 
several factors. Likely dominant are neurite-continuity errors, which 
occur roughly six times per bipolar cell”* but presumably mostly in the 
periphery and thus should cause only a small fractional loss (or false 
addition) of synapses. Local volume reconstruction seems to be fairly 
reliable. Finally, although not all contacts are synaptic, there are, typ- 
ically, many contacts between any actually connected pair of cells****, 
making it unlikely that any strong connection is spurious. The con- 
nectivity estimate between CBC5R (38 cells) and W3a (gc36-51, 3 
cells), for example, is based on the areas of 1,358 observed contacts, 
for which our simulation predicts between 278 and 705 synaptic con- 
tacts (fifth and ninety-fifth percentiles, respectively) with a median of 
483 contacts, that is, 13 per bipolar cell and 161 per ganglion cell. The 
direction of a potential synaptic connection can in most cases not be 
determined by visually inspecting the e2006 data set but contacts ontoa 
ganglion cell, for example, are presumably never postsynaptic. 

Our analysis of three ON/OFF layer cell types has several concrete 
functional implications, which, at the very least, will guide further 
exploration by other means. For example, ‘bright’ W3 cells (gc36-51 
or gc44-52) respond much more vigorously to a small darkening spot 
than to a small brightening spot (Fig. 5b in ref. 47). This may well be 
due to the ac44-54-mediated feed-forward inhibitory pathway on the 
ON side (Fig. 6f and Supplementary Data 1), for which no corres- 
ponding pathways is seen on the OFF side. W3b-CBC contacts are 
concentrated on the ON side (ON/OFF: 15.2%/7.6%) but evenly 
balanced for W3a (10.9%/10.9%), which suggests that it is W3a 
(gc36-51) that corresponds to the physiologically examined cells 
described previously”. Another characteristic of the W3 cell is that 
its response is completely suppressed by movement in the receptive- 
field surround”. Given the lack of thin, unbranched processes emerging 
from its soma, it is unlikely that ac43-49 corresponds to the poly-axonal 
amacrine cell implicated in this suppression*’, but ac43-49 may well 
mediate (or at least augment) suppression for stimuli in the near- 
surround (M. Meister, personal communication). It could do this for 
OFF and for ON stimuli because it is contacted by CBCs in both OFF 
and ON layers (Fig. 6f and Supplementary Data 1). 

In addition to DSGCs (gc30-63), SACs contact gc31-56 strongly 
(Fig. 6f). It will be interesting to find out whether gc31-56 is also 
direction sensitive, or at least motion sensitive, and why there is a 
morphological symmetry (Fig. 2b) between the ON and OFF layers in 
gc31-56 but a strong imbalance between the strong ON bipolar cell 
and the weak OFF bipolar cell ‘input’ (Fig. 6f). 

The circuit motifs found for W3a/b, XBC and gc31-56 are only the 
first of many examples of motifs likely to be found when these data (a 
repository of raw data, skeletons and volume segmentation can be 
found at http://www.neuro.mpg.de/connectomics) are examined in 
the context of virtually every functional question in the retina. 


METHODS SUMMARY 

Tissue preparation for SBEM. The retinae for the e2006 and k563 data sets were 
prepared as described previously’. 

SBEM imaging and data analysis. The sample was mounted in a custom-built 
ultra-microtome operating inside the chamber of a field-emission scanning elec- 
tron microscope (FEI QuantaFEG 200), and serial block-face imaged under 
130 Pa hydrogen, at 3 keV landing energy, a dose of 14 electrons per nm’, and 
a resolution of 16.5 X 16.5 X 25 nm? (for the conventionally stained sample, see 
Methods). A custom-designed back scattered-electron detector was used. SBEM 
data were aligned and stitched using custom Matlab routines. Skeletons were 


ARTICLE 


manually traced by trained student annotators using custom written software 
(KNOSSOS, http://www.knossostool.org) and consolidated using RESCOP”. 
Volumes were traced using KLEE (M.H. et al., manuscript in preparation). Boundary 
classification was with a five-hidden-layer convolutional neural network that was 
trained using the MALIS procedure” (S.C.T. et al., manuscript in preparation). 
Segmentation used a 15-step iterative growth procedure, followed by a 6-step 
merging procedure. Data visualization was in KLEE, Knossos, Matlab, Mathe- 
matica and Amira. 


Full Methods and any associated references are available in the online version of 
the paper. 


Received 8 January; accepted 3 June 2013. 


1. White, J.G., Southgate, E., Thomson, J. N. & Brenner, S. The structure of the nervous 
system of the nematode Caenorhabditis elegans. Phil. Trans. R. Soc. Lond. B 314, 
1-340 (1986). 

2. Varshney, L. R., Chen, B. L, Paniagua, E., Hall, D. H. & Chklovskii, D. B. Structural 
properties of the Caenorhabditis elegans neuronal network. PLOS Comput. Biol. 7, 
e1001066 (2011). 

3. Binzegger, T., Douglas, R.J.& Martin, K.A.C. A quantitative map of the circuit of cat 
primary visual cortex. J. Neurosci. 24, 8441-8453 (2004). 

4. Helmstaedter, M., de Kock, C. P., Feldmeyer, D., Bruno, R. M. & Sakmann, B. 
Reconstruction of an average cortical column in silico. Brain Res. Rev. 55, 193-203 
(2007). 

5. Briggman, K. L., Helmstaedter, M. & Denk, W. Wiring specificity in the direction- 
selectivity circuit of the retina. Nature 471, 183-188 (2011). 

6. Stepanyants, A. & Chklovskii, D. B. Neurogeometry and potential synaptic 
connectivity. Trends Neurosci. 28, 387-394 (2005). 

7. Mishchenko, Y. et a/. Ultrastructural analysis of hippocampal neuropil from the 
connectomics perspective. Neuron 67, 1009-1020 (2010). 

8. Hill, S.L., Wang, Y., Riachi, |, Schurmann, F. & Markram, H. Statistical connectivity 
provides a sufficient foundation for specific functional connectivity in neocortical 
neural microcircuits. Proc. Nat! Acad. Sci. USA 109, E2885-E2894 (2012). 

9. Denk, W., Briggman, K.L. & Helmstaedter, M. Structural neurobiology: missing link 
to a mechanistic understanding of neural computation. Nature Rev. Neurosci. 13, 
351-358 (2012). 

10. Peters, A. Thalamic input to the cerebral cortex. Trends Neurosci. 2, 183-185 
(1979). 

11. Markram, H., Lubke, J., Frotscher, M., Roth, A. & Sakmann, B. Physiology and 
anatomy of synaptic connections between thick tufted pyramidal neurones in the 
developing rat neocortex. J. Physiol. (Lond.) 500, 409-440 (1997). 

12. Fried, S. |., Munch, T. A. & Werblin, F.S. Mechanisms and circuitry underlying 
directional selectivity in the retina. Nature 420, 411-414 (2002). 

13. Asari, H. & Meister, M. Divergence of visual channels in the inner retina. Nature 
Neurosci. 15, 1581-1589 (2012). 

14. Stevens, J. K., Davis, T. L., Friedman, N. & Sterling, P. A systematic approach to 

reconstructing microcircuitry by electron microscopy of serial sections. Brain Res. 

, 265-293 (1980). 

erling, P. Microcircuitry of the cat retina. Annu. Rev. Neurosci. 6, 149-185 (1983). 

amiglietti, E. V. Synaptic organization of starburst amacrine cells in rabbit retina: 

nalysis of serial thin sections by electron microscopy and graphic reconstruction. 

Comp. Neurol. 309, 40-70 (1991). 

17. McGuire, B. A., Stevens, J. K. & Sterling, P. Microcircuitry of bipolar cells in cat 

retina. J. Neurosci. 4, 2920-2938 (1984). 

18. Briggman, K. L. & Bock, D. D. Volume electron microscopy for neuronal circuit 

reconstruction. Curr. Opin. Neurobiol. 22, 154-161 (2012). 

19. Masland, R. H. The neuronal organization of the retina. Neuron 76, 266-280 
(2012). 

20. Vaney, D. I., Sivyer, B. & Taylor, W. R. Direction selectivity in the retina: symmetry 
and asymmetry in structure and function. Nature Rev. Neurosci. 13, 194-208 
(2012). 

21. Euler, T., Detwiler, P. B. & Denk, W. Directionally selective calcium signals in 
dendrites of starburst amacrine cells. Nature 418, 845-852 (2002). 

22. Zhou, Z. J. & Lee, S. Synaptic physiology of direction selectivity in the retina. 

J. Physiol. (Lond.) 586, 4371-4376 (2008). 

23. Wei, W., Hamby, A. M., Zhou, K. & Feller, M. B. Development of asymmetric 
inhibition underlying direction selectivity in the retina. Nature 469, 402-406 
(2011). 

24. Denk, W. & Horstmann, H. Serial block-face scanning electron microscopy to 
reconstruct three-dimensional tissue nanostructure. PLoS Biol. 2, e329 (2004). 

25. Helmstaedter, M., Briggman, K. L.& Denk, W. High-accuracy neurite reconstruction 
for high-throughput neuroanatomy. Nature Neurosci. 14, 1081-1088 (2011). 

26. Jain, V. etal. Supervised learning of image restoration with convolutional networks. 
IEEE 11th International Conference on Computer Vision 2, 1-8 (2007). 

27. Turaga, S.C. et al. Convolutional networks can learn to generate affinity graphs for 
image segmentation. Neural Comput. 22, 511-538 (2010). 

28. Wassle, H., Puller, C., Muller, F. & Haverkamp, S. Cone contacts, mosaics, and 
territories of bipolar cells in the mouse retina. J. Neurosci. 29, 106-117 (2009). 

29. Amthor, F. R., Oyster, C. W. & Takahashi, E. S. Morphology of on-off direction- 
selective ganglion cells in the rabbit retina. Brain Res. 298, 187-190 (1984). 

30. Strettoi, E., Raviola, E. & Dacheux, R. F. Synaptic connections of the narrow-field, 
bistratified rod amacrine cell (All) in the rabbit retina. J. Comp. Neurol. 325, 
152-168 (1992). 


a 
oo 717MN 


8 AUGUST 2013 | VOL 500 | NATURE | 173 


©2013 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


31. Macneil, M.A., Heussy, J. K., Dacheux, R. F., Raviola, E. & Masland, R. H. The shapes 
and numbers of amacrine cells: matching of photofilled with Golgi-stained cells in 
the rabbit retina and comparison with other mammalian species. J. Comp. Neurol. 


32. 


33. 


34. 


35. 


413, 305-326 (1999). 

Fyk-Kolodziej, B. & Pourcho, R. G. Differential distribution 
activated and cyclic nucleotide-gated channels in cone bi 
retina. J. Comp. Neurol. 501, 891-903 (2007). 

Wassle, H. & Riemann, H.J. Mosaic of nerve-cells in mamm 
Lond. B 200, 441-461 (1978). 

Kim, |. J., Zhang, Y., Meister, M. & Sanes, J. R. Laminar restri 
cell dendrites and axons: subtype-specific developmenta 
transgenic markers. J. Neurosci. 30, 1452-1462 (2010). 
Levick, W. R. Receptive fields and trigger features of gang 


of hyperpolarization- 
polar cells of the rat 


alian retina. Proc. R. Soc. 


ction of retinal ganglion 
patterns revealed with 


ion cells in the visual 


C. Roome for IT support, and A. Borst, M. Fee, T. Gollisch and A. Karpova for comments 
on the manuscript. We especially thank F. Isensee for help with synapse identification. 
We thank P. Bastians, A. Biasotto, F. Drawitsch, H. Falk, A. Gable, M. Grohmann, 

A. Gabelein, J. Hanne, F. Isensee, H. Jakobi, M. Kotchourko, E. Moller, J. Poll mann, 

C. Rohrig, A. Rommerskirchen, L. Schreiber, C. Willburger, H. Wissler and J. Your for 
reconstruction management and annotator training, and N. Abazova, S. Abele, 

O. Aderhold, C. Altburger, T. Amberger, K. Aninditha, A. Antunes, E. Atsiatorme, 

H. Augenstein, |. Bartsch, |. Barz, P. Bastians, J. Bauer, H. Bauersachs, R. Bay, J. Becker, 
M. Beez, S. Bender, M. Berberich, |. Bertlich, J. Bewersdorf, A. Biasotto, P. Biti, 

M. Bittmann, K. Bretzel, J. Briegel, E. Buckler, A. Buntjer, C. Burkhardt, S. Buhler, 

S. Daum, N. Demir, E. Demirel, S. Dettmer, M. Diemer, J. Dietrich, S. Dittrich, C. Domnick, 
F. Drawitsch, C. Eck, L. Ehm, S. Ehrhardt, T. Eliguezel, K. Ernst, O. Eryilmaz, F. Euler, 

H. Falk, K. Fischer, K. Foerster, R. Foitzik, A. Foltin, R. Foltin, S. FreiB, A. Gable, P. Gallandi, 


streak of the rabbits retina. J. Physiol. (Lond.) 188, 285-307 (1967). 

36. Amthor, F. R., Takahashi, E. S. & Oyster, C. W. Morphologies of rabbit retinal 
ganglion cells with complex receptive fields. J. Comp. Neurol. 280, 97-121 (1989). 

37. Kolb, H., Nelson, R. & Mariani, A. Amacrine cells, bipolar cells and ganglion cells of 
the cat retina: a Golgi study. Vision Res. 21, 1081-1114 (1981). 

38. Sivyer, B., Venkataramani, S., Taylor, W. R. & Vaney, D. |. A novel type of complex 
ganglion cell in rabbit retina. J. Comp. Neurol. 519, 3128-3138 (2011). 

39. Siegert, S. et al. Genetic address book for retinal cell types. Nature Neurosci. 12, 
1197-1204 (2009). 

40. Badea, T. C. & Nathans, J. Quantitative analysis of neuronal morphologies in the 
mouse retina visualized by using a genetically directed reporter. J. Comp. Neurol. 
480, 331-351 (2004). 

41. Joo, H. R., Peterson, B. B., Haun, T. J. & Dacey, D. M. Characterization of a novel 
large-field cone bipolar cell type in the primate retina: evidence for selective cone 
connections. Vis. Neurosci. 28, 29-37 (2011). 

42. Seung, H.S. Reading the book of memory: sparse sampling versus dense mapping 
of connectomes. Neuron 62, 17-29 (2009). 

43. Masland, R. H. The fundamental plan of the retina. Nature Neurosci. 4, 877-886 
(2001). 

44. Andres, B. etal. in Computer Vision-ECCV 2012 Lecture Notes in Computer Science 
(eds Fitzgibbon, A. et al.) 778-791 (Springer, 2012). 

45. Tsukamoto, Y., Morigiwa, K., Ueda, M. & Sterling, P. Microcircuits for night vision in 
mouse retina. J. Neurosci. 21, 8616-8623 (2001). 

46. Calkins, D. J. & Sterling, P. Microcircuitry for two types of achromatic ganglion cell 
in primate fovea. J. Neurosci. 27, 2646-2653 (2007). 

47. Zhang, Y.,Kim, |.J., Sanes, J.R. & Meister, M. The most numerous ganglion cell type 
of the mouse retina is a selective feature detector. Proc. Natl Acad. Sci. USA 109, 
E2391-E2398 (2012). 

48. Olveczky, B. P., Baccus, S.A. & Meister, M. Segregation of object and background 
motion in the retina. Nature 423, 401-408 (2003). 

49. Turaga, S. C., Briggman, K., Helmstaedter, M., Denk, W. & Seung, H. S. Maximin 
affinity learning of image segmentation. Adv. Neural Info. Proc. Syst. 22, 1-8 (2009). 


Supplementary Information is available in the online version of the paper. 


Acknowledgements We thank J. Diamond, T. Euler, R. Masland, M. Meister and J. Sanes 
for discussions, J. Kornfeld and F. Svara for programming and continually improving 
KNOSSOS, M. Miller and J. Tritthardt for programming and building instrumentation, 


174 | NATURE | VOL 500 | 8 AUGUST 2013 


K. Gar 
S. Grol 


be, A. Gebhardt, F. Geb 
bner, E. Griin, M. Grtin, 


hart, S. Gottwalt, A. Greis, M. Grohmann, A. Gromann, 
K. Guo, A. Gabelein, K. Haase, J. Hammerich, J. Hanne, 


B. Hauber, M. Hensen, F. Hentzschel, M. Herberz, M. Heumannskamper, C. Hilbert, 

L. Hofmann, P. Hofmann, T. Hondrich, U. Hausler, M. Héreth, J. Hugle, F. lsensee, 

A. lvanova, F. Jahnke, H. Jakobi, M. Joel, M. Jonczyk, A. Joschko, A. Jiinger, K. Kappler, 
S. Kaspar, C. Kehrel, J. Kern, K. KeBler, S. Khoury, M. Kiapes, M. Kirchberger, A. Klein, 
C. Klein, S. Klein, J. Kratzer, C. Kraut, P. Kremer, P. Kretzer, F. Kréller, D. Kruger, 

. Kuderer, S. Kull, S. Kwakman, S. Laiouar, L. Lebelt, H. Lesch, R. Lichtenberger, 

J. Liermann, C. Lieven, J. Lin, B. Linser, S. Lorger, J. Lott, D. Luft, L. Lust, J. Loffler, 

C. Marschall, B. Martin, D. Maton, B. Mayer, S. Mayorca, de. Ituarte, M. Meleux, C. Meyer, 
. Moll, T. Moll, L. Mroszewski, E. Moller, M. Muller, L. Munster, N. Nasresfahani, 

J. Nassal, M. Neuschwanger, C. Nguyen, J. Nguyen, N. Nitsche, S. Oberrauch, F. Obitz, 
D. Ollech, C. Orlik, T. Otolski, S. Oumohand, A. Palfi, J. Pesch, M. Pfarr, S. Pfarr, 

M. Pohrath, J. Pollmann, M. Prokscha, S. Putzke, E. Rachmad, M. Reichert, J. Reinhardt, 
. Reitz, J. Remus, M. Richter, M. Richter, J. Ricken, N. Rieger, F. Rodriguez. Jahnke, 
A. Rommerskirchen, M. Roth, |. Rummer, J. Ratzer, C. Rohrig, J. Rother, V. Saratov, 

E. Sauter, T. Schackel, M. Schamberger, M. Scheller, J. Schied, M. Schiedeck, J. Schiele, 
K. Schleich, M. Schlosser, S. Schmidt, C. Schneeweis, K. Schramm, M. Schramm, 

L. Schreiber, D. Schwarz, A. Schurholz, L. Schtitz, A. Seitz, C. Sell mann, E. Serger, 

J. Sieber, L. Silbermann, |. Sonntag, T. Speck, Y. Sohngen, T. Tannig, N. Tisch, V. Tran, 
J. Trendel, M. Uhrig, D. Vecsei, F. Viehweger, V. Viehweger, R. Vogel, A. Vogel, J. Volz, 

P. Weber, K. Wegmeyer, J. Wiederspohn, E. Wiegand, R. Wiggers, C. Willburger, 

H. Wissler, V. Wissdorf, S. Worner, J. Yourn, A. Zegarra, J. Zeilfelder, F. Zickgraf and 

T. Ziegler for cell reconstruction. This work was supported by the Max-Planck Society 
and the DFG (Leibniz prize to W.D.). H.S.S. is grateful for support from the Gatsby 
Charitable Foundation. 


Author Contributions M.H. and W.D. designed the study. K.L.B. prepared the samples 
and acquired the data using a microtome designed by W.D. M.H. analysed the data, with 
minor contributions from W.D. S.C.T., V.J.and H.S.S. developed the boundary classifier. 
M.H., K.L.B. and W.D. wrote the paper. 


Author Information Reprints and permissions information is available at 
www.nature.com/reprints. The authors declare competing financial interests: details 
accompany the full-text HTML version of the paper at www.nature.com/nature. 
Readers are welcome to comment on the online version of the paper. Correspondence 
and requests for materials should be addressed to M.H. 
(mhelmstaedter@neuro.mpg.de). 


©2013 Macmillan Publishers Limited. All rights reserved 


METHODS 


Data acquisition. A retina from a 30-day old-C57BL/6 mouse (data set e2006) was 
prepared to selectively enhance cell outlines by using the horseradish peroxidase 
(HRP)-mediated precipitation of 3,3’-diaminobenzidine (DAB), as described 
previously’, and stained with osmium and lead citrate. The shrinkage of our tissue 
was very likely the same as that for the e2198 sample’, which was imaged in the 
living state by two-photon microscopy and then by SBEM, allowing a precise 
estimate (14%) of the linear shrinkage factor (K.L.B. et al., unpublished observa- 
tions). All procedures were approved by the local animal care committee and were 
in accordance with the law of animal experimentation issued by the German 
Federal Government. 

The embedded tissue was trimmed to a block face of ~200 tum X 300 um and 
imaged ina scanning electron microscope with a field-emission cathode (QuantaFEG 
200, FEI Company) and a custom-designed back scattered-electron detector based on 
a silicon diode (AXUV, International Radiation Detectors) combined with a custom- 
built current amplifier (J. Tritthardt, Max Planck Institute for Medical Research, 
electronics shop). The incident-electron energy was 3.0keV, the beam current 
~100 pA. At a pixel dwell time of 6 pis and a pixel size of 16.5 nm X 16.5 nm this 
resulted in an electron dose of about 14 electrons nm’, not accounting for skirting 
due to low-vacuum operation. The chamber was kept at a pressure of 130 Pa hydro- 
gen to prevent charging. The electron microscope was equipped with a custom-made 
microtome”, which allows the repeated removal of the block surface at a cutting 
thickness of =25 nm. A total of 3,200 consecutive slices were imaged, leading to a data 
volume of 8,192 X 7,072 X 3,200 voxels (a 4X 4 mosaic of images 2,048 X 1,768 
pixels in size). As the edges of neighbouring mosaic images overlapped by ~1 jum, 
this corresponds to a physical size of about 132 1m X 114 um for each slice anda 
total thickness of 801m. Note that stitching led to substantial shear (about 4 
degrees) in z. 

The cutting speed was 0.5mms_". To avoid chatter and ensure even cutting, the 
diamond knife (facet angle 50°, clearance angle 20°, Diatome) was vibrated along the 
knife-edge direction with a frequency of ~12kHz using a small piezo actuator 
integrated into the knife holder®. Focus and astigmatism were continually moni- 
tored (using the ‘heuristic algorithm’ described previously”) on the basis of acquired 
images and automatically adjusted. After each cut, a low-resolution overview image 
was acquired and used to automatically detect cutting debris on the surface. If debris 
was detected, the knife was passed over the surface with 40-nm clearance in an 
attempt to remove the debris. Consecutive slices were aligned offline to sub-pixel 
precision by Fourier shift-based interpolation, using cross-correlation-derived shift 
vectors. Note that the sub-volume inside the data set that contains valid data is a 
rhomboid. 

Skeletonization. The data set was prepared as described previously” for crowd- 
sourced skeletonization by trained human annotators, which were specifically 
recruited from the local student population. This is different from some other 
‘citizen science’ projects but encountered similar problems, such as the need to 
establish a mechanism for cross-validation. The data were visualized and annota- 
tions were captured using the KNOSSOS program” (http://www.knossostool. 
org). First, all somata in the inner plexiform and ganglion cell layers were iden- 
tified and classified as ganglion, amacrine, bipolar, horizontal and glia cells, using 
the location of each soma and the types of neurites emerging from it. Then, 
starting from the soma, each neuron was traced, by multiple tracers (6, 4 and 4 
for ganglion cells, bipolar cells and amacrine cells, respectively). Tracings were 
then consolidated using RESCOP” with the following refinements: all edges 
within 3 um of the soma centre were eliminated, no edges were eliminated 
between 3 and 10 jum, and, except for somata in the ganglion cell layer, branches 
were allowed to pass 15 um only if their multiplicity (pro votes) compared to the 
maximum multiplicity of any branch leaving the same soma (total votes) was 
acceptable according to the voting rules”. Type-grouped skeletons were visually 
scanned using Amira (VSG, Merignac Cedec) and KNOSSOS. For 34 apparently 
aberrant branches, their originating branch points were inspected in the raw data, 
and removed if erroneous (12 cases). Density profiles were calculated by collect- 
ing the edge centres into 50-nm-wide bins using the length divided by the total 
vote count” for each edge as the weight. Histograms were normalized and used to 
calculate the quartiles. 

Type classification. Cells were visually inspected using views as shown in Fig. 1b. 
The morphological criteria used were the neurite density with depth in the IPL 
and the lateral branching pattern. Connectivity information was used to sub- 
divide CBC5s (see below). Ifa cell could not be grouped with at least one other cell 
it was not assigned to a type and instead added to the ‘orphan’ category, even 
when showing a discernible neurite morphology (Supplementary Data 6). The 
contact data for the 110 orphan cells are not shown in Fig. 4 but are included in 
Supplementary Data 4 and 5. We refer to the types here by their column/row index 
in the type matrix. Supplementary Data 4, sheet 3, and Supplementary Data 7 
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provide translation between, respectively, the different indices for individual cells 
and between type indices, type identifiers, and common type names. 

The classification of the cells proceeded as follows. The neurite ramification 
pattern in the IPL, particularly its distribution along the light axis and its overall 
lateral size, was used first. We don’t usually comment when cells obviously cluster 
into a type by those criteria (for example, types 9, 33 and 51). Unless otherwise 
indicated, percentage numbers represent position along the light axis. As the 
boundaries of the IPL (0%, 100%), we defined the points where the total skeleton 
density falls below 15% of its maximum. We use the point where the skeleton 
densities of ON and OFF bipolar cells cross over (46.5%), as the ON/OFF bound- 
ary. In some cases (types 58-62, corresponding to CBC1-4, and in one case for the 
CBCS5A versus CBCSR distinction) we used, in addition, tiling (the lateral overlap 
between neurites in the plane of the retina). 

First, we identified the ON and OFF SACs (types 33 and 51). We next con- 
sidered all remaining cells that had their somata in the GCL. Because we were 
initially not sure how reliably the axon could be detected, we did not use the 
presence of an axon as a criterion to distinguish ganglion cells from displaced 
amacrine cells. In all but one of the cells classified as ganglion cells an axon was 
found eventually. We begin with the actual ganglion cells (types 1-12), postpon- 
ing the discussion of displaced amacrine cells (types 27, 43, 51 and 56-57). 

There are three clearly bistratified ganglion cell types (2, 8 and 9) that exten- 
sively ramify in both ON and OFF layers. Only type 2 has one of its bands 
immediately adjacent to the INL, whereas the lowest band of type 8 is well 
separated from the INL. Additional discrimination was provided by bands at 
50% and 70% for types 2 and 8, respectively. Only one of the type-8 cells shows 
all aspects of the dendritic tree, whereas the other two cells are presumably 
missing parts of the dendrite inside the reconstructed volume but share enough 
features to put them into the same class. Type 9 is the ON/OFF DSGC. 

Type 6 could be called bistratified but the space between the bands still contains 
a lot of neurite. The two bands are just inside the choline acetyltransferase 
(ChAT) bands, which is where the SACs (types 33 and 51) and DSGCs (type 9) 
ramify. Types 7 and 10 both have only one band straddling the ON/OFF border, 
but 7 has numerous branches going all the way to the INL. Types 7 and 10 
probably correspond to the two subtypes labelled in the TYW3 mouse™. 

Next we considered cells that ramify mostly in the OFF (types 1, 3, 4 and 5) or 
the ON layer (types 11 and 12). Among those, only types 1 and 3 ramify all the 
way up to the INL (a slight dendritic resemblance to type 27, a displaced amacrine 
cell, can be resolved by looking at the lateral (in-plane) branching pattern, which 
is much more tortuous for 27). Type 1 has multiple branches emerging directly 
from the soma but type 3 only has a single one. Type 5 has a much denser in-plane 
branching pattern than type 4. Type 11 ramifies further towards the GCL than 
type 5 and is broader than type 10. Type 12 is the only ganglion cell ramifying in a 
single band adjacent to the GCL. 

Among the amacrine cells we started with the narrow-field types (13-24). 
Types 18, 20 and 24 all reach deep into the ON layer and have bands in both 
the ON and OFF layers, which was used to separate them from 23 and 22, with no 
bands in the OFF layer, even though the variability of the OFF band in type 20 
made it difficult to distinguish type 20 and 22 cells, possibly causing some mis- 
classification. Type 18 shows a sharp band at about 70% and a broader band 
touching the INL. Types 13 and 14 were difficult to distinguish, but 14 has a 
clearer gap to the INL and a less dense dendrite. Types 16 and 17 differ in lateral 
size. Some overlap between 16 and 15 cannot be completely ruled out but most 
type 15 cells are shorter and end mostly in a dense band. Types 19 and 21 differ in 
lateral size (21 and 42 may be the same type). 

Next we considered cells (types 25, 28, 30-32, 37, 39, 41, 47, 53 and 57) in which 
the branching pattern suggested wide fields, for example, because only few of their 
branches ended inside the sample. Many of these cells (types 25, 37, 39, 41, 47 and 
53) show a sharp lamination in depth. Only type 25 ramifies close to the INL. 
Type 30 is more strongly branched than 28 and ramifies broadly in depth, unusual 
for wide-field cells. Type 28, unlike type 30, has two branches leaving in opposite 
directions. Type 31 dendrites, uniquely among the cells reconstructed, go off into 
a narrow segment. Type 32 ramifies in the OFF ChAT band, but branches dif- 
ferently from the OFF SAC (type 33). Type 39 has only a single primary branch, 
whereas type 41, which stratifies at almost the same depth, has several. Note that 
types 37, 41/39 and 47 subdivide the space between the ChAT bands into three 
equal sublaminae. 

The remaining amacrine cells are medium-field cells (types 26, 27, 29, 33-36, 
38, 40, 42-46, 48-52 and 54-57), including the unmistakable SACs (types 33 and 
51, see above). Types 34 (an interplexiform cell), 49 and 52 uniquely reach all the 
way across the IPL. Type 49 has the very distinctive ‘waterfall’ anatomy and type 
52 lacks the sharp band right outside the INL of type 34. Types 35 and 38 were 
distinguished by how far their dendrites reached towards the GCL. Types 48 and 
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50 differ in primary dendrite shape and in-plane size but may still be the same 
type. Type 45 has more primary dendrites than type 54. 

To classify bipolar cells (types 58-71) we first tried to establish similarity to the 
types described previously’*. The correspondence was mostly obvious for RBCs 
(type 71) and ON CBCs (types 63-70)—see the main text for CBC5 (types 63-65) 
and XBC (type 66)—but rather difficult for OFF CBCs (types 58-62). 

First, all OFF bipolar cells were sorted using the seventy-fifth percentile of the 

cumulative skeleton density in depth (starting from 0%) then, the lower 58.2% 
(their prevalence; see Table 1 in ref. 28) of cells were placed in the CBC3A/3B/4 
and the remainder into the CBC1/2 category. The former was then sorted by 
the twenty-fifth percentile. Because this distribution was not clearly separable 
(consistent with the CBC4 width being smaller and more variable than drawn 
previously”*), we began to collect the CBC4 cells starting at the highest twenty- 
fifth percentile numbers, adding cells consistent with the mosaic until the 
required prevalence was reached. The same procedure, now using the axonal 
coverage area, was used to separate CBC3A from B, reported to be larger 
CBC3A (ref. 28), and CBC1/2 using the spread in depth of the axon (twentieth 
to eightieth percentile). Finally, all mosaics were inspected again, six cells were 
reassigned, and one cell (cell 927, Supplementary Data 6) was moved to the 
‘orphan’ group as it did not fit into any of the mosaics. In the resulting grouping, 
types 60-62 show a ramification-free zone adjacent to the INL that is lacking in 
types 58 and 59. Type 59 dendrites, if anything, are closer to the INL than type 58 
dendrites. Type 62 ramifies slightly more widely in depth than types 60 and 61. 
Type 61 tends to be smaller than type 60. 
Segmentation. A feed-forward convolutional neural network” was trained to 
classify connectedness (roughly a probability) between voxels sharing a face (the 
Matlab code and the network weights are in Supplementary Data 8). Several sub- 
volumes (each 100 X 100 X 100 voxels in size) were fully segmented using KLEE 
and served as the initial training data, which was gradually augmented by semi- 
automatically segmented volumes (proofread segmentations generated with earlier 
network versions), yielding a final training set of 12 substantial volumes ranging 
from (128 voxels)* to (240 voxels)* (more than 800 million example image patches, 
including translations and rotations). The network contained 5 hidden layers with 
10 feature maps each and was trained for over 5.5 million mini-batch gradient 
update steps until convergence, corresponding to many central processing unit 
(CPU) months, in a greedy and supervised layer-wise manner using a modified 
version of MALIS”, modified to assign equal weight to each segment (S.C.T. et al., 
manuscript in preparation). All filters were 7 X 7 X 7 voxels in size and used a 
logistic sigmoid nonlinearity. After classifying voxel connectedness for the whole 
data set, segmentation was as follows. First, the voxels were clustered using a 
threshold of 0.9999. Clusters with =10 voxels were used as seeds and grown to 
threshold of 0.999. Unconsumed voxels were clustered using the same threshold, 
followed by seed selection and growth, now to 0.99. This procedure was repeated 
using thresholds of 0.98, 0.96, 0.94, 0.92, 0.9, 0.85, 0.8, 0.7, 0.6, 0.5, 0.4 and 0.2, and 
resulted in the assignment of each voxels in the data set to a supervoxel (on average 
517 voxels), which were now merged using the following criteria: first, objects larger 
than 36 voxels were merged with each other if the boundary classifier averaged 
across their interface was above a threshold that was gradually lowered from 0.95 
to 0.75 in linear steps of 0.05. In the next phase, only objects of unequal size were 
allowed to merge. The ‘forbidden’ size intervals (in voxels) and the interface thresh- 
olds for each step were: 2,000-200, 0.65; 2,000-200, 0.6; 2,000-200, 0.55; 2,000-200, 
0.55; 2,000-400, 0.6; 2,000-800, 0.6; 2,000-1,600, 0.6; 2,000—1,600, 0.6; 2,000-1,600, 
0.6; 5,000-2,000, 0.6; 10,000-3,000, 0.6; 20,000-4,000, 0.6; 25,000-5,000, 0.6; 
30,000—6,000, 0.6. 


This increased the average segment size to 2,443 voxels. Segments were then 
assigned to that skeleton that had the most nodes in the segment (only a small 
fraction contained nodes from more than one skeleton). All segments assigned to 
a skeleton comprise the volume reconstruction of the corresponding cell. The 
volume fraction erroneously assigned was estimated by summing the volume of 
all segments that contained multiple skeletons, weighted by the fraction of mino- 
rity nodes in the segment and divided by the total volume of segments assigned to 
any skeleton. 

Contact detection. To quantify contacts between segments, segment-to-segment 
overlap matrices were calculated between the original segmentation and versions 
shifted by one voxel, respectively, in the x, y and z directions. The resulting three 
collections of overlapping voxels were combined and classified and grouped into 
‘contacts’ (Supplementary Data 8) using a dilation-based proximity measure. The 
contact areas were calculated using the following weights (nm) depending on 
according to the combination of direction sets they occurred in: 412.5 (x or y), 
272.25 (z), 583.3631 (xand y), 494.2432 ((x or y) and z), 643.7644 (x, yand z). This 
corrects for the anisotropy in voxel size and to some extent for the error intro- 
duced by the angle of the contact surface. For surfaces perpendicular to one of the 
principal axes, the face diagonals, or the space diagonal this estimate is exact. 
Error estimation. To probe the frequency of missed contacts (false negatives) we 
selected 100 random locations on one skeleton (cell 17, gc36-51, W3) and 
searched for true contacts with an, according to the cell-cell matrix, highly con- 
nected cell (cell 344, ac34-84). All 16 true contacts found were also found by the 
automated detection routine. To estimate the false positive rate we randomly 
selected 20 of the 7,217 contacts that the same ganglion cell made with other cells 
and visually inspected the corresponding locations in the raw data. In one case no 
actual contact existed (a piece of debris was erroneously attributed by the seg- 
mentation routine to cell 344). 

Sizes for synaptic and non-synaptic contacts. Synaptic and non-synaptic con- 
tacts in the conventionally stained data set (k563) were selected and their contact 
area determined in one of two ways. (1) Starting from a bipolar cell axon terminal, 
a synaptic ribbon was located (Fig. 3d), the two postsynaptic dyadic partners were 
found and their class determined, using the presence or absence of synaptic 
vesicles (found in amacrine but not ganglion cell dendrites). All three dyadic 
partners and, in addition, a nearby non-synaptic contact were manually recon- 
structed using the KLEE software tool (M.H. et al., manuscript in preparation) in 
a region including all three contacts. The contact areas were determined as 
follows. Surface triangulations were generated for each volume reconstruction, 
then for each triangle it was determined whether there was another object within 
144 nm above it, next the contact area with this object was calculated as the sum 
over all hits in that object weighted by the triangle areas. (2) All contacts with 
bipolar cells were reconstructed on several pieces of ganglion cell dendrite, quan- 
tified, and classified as synaptic when a ribbon was present and non-synaptic 
otherwise. Classification, segmentation and contact detection were performed 
independently for each member of a set of overlapping cubes (257 voxels on a 
side), one cube for each interior data cube (128 voxels on a side). Each of those 
cubes overlaps one data cube completely and 26 cubes partially. To avoid double 
counting, we counted a contact only when the largest part of the contact was 
inside the completely overlapped (central) data cube. 
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A visual motion detection circuit 
suggested by Drosophila connectomics 
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Animal behaviour arises from computations in neuronal circuits, but our understanding of these computations has 
been frustrated by the lack of detailed synaptic connection maps, or connectomes. For example, despite intensive 
investigations over half a century, the neuronal implementation of local motion detection in the insect visual system 
remains elusive. Here we develop a semi-automated pipeline using electron microscopy to reconstruct a connectome, 
containing 379 neurons and 8,637 chemical synaptic contacts, within the Drosophila optic medulla. By matching 
reconstructed neurons to examples from light microscopy, we assigned neurons to cell types and assembled a 
connectome of the repeating module of the medulla. Within this module, we identified cell types constituting a 
motion detection circuit, and showed that the connections onto individual motion-sensitive neurons in this circuit 
were consistent with their direction selectivity. Our results identify cellular targets for future functional investigations, 
and demonstrate that connectomes can provide key insights into neuronal computations. 


Vision in insects has been subject to intense behavioural’, physiological” 
and anatomical’ investigations, yet our understanding of its underlying 
neural computations is still far from complete. One such computation, 
ethologically highly relevant, is motion detection, which is thought 
to rely on the comparison between signals offset in space and time**® 
(Fig. la, b). Yet, despite being the focus of theoretical and experimental 
investigations for more than half a century’, the exact mechanism 
underlying this computation remains a mystery. A central impediment 
towards unravelling this mechanism has been our incomplete know- 
ledge of the relevant neurons and synapses. 

In the fly, visual processing begins in the optic lobe, composed of 
four retinotopically organized neuropils. Each is an array of repeating 
modules corresponding to the hexagonal lattice of ommatidia in the 
compound eye (Fig. 1c). Each module of the first neuropil, the lamina, 
contains a repeating circuit*” receiving inputs from six photorecep- 
tors detecting light from the same location in the visual field. The 
output cells of each lamina module project to a corresponding module 
of the second neuropil, the medulla (Fig. 1c). Each medulla module, 
called a column (Fig. 1d), is also thought to contain stereotypic circuits’®. 
These columns, in turn, innervate two downstream neuropils, the lobula 
and lobula plate (Fig. 1c). 

Responses to local motion must be computed at least partly within 
the stereotypical circuits of the medulla columns. Indeed, the medulla 
is the first neuropil with movement-specific activity’, and, directly 
downstream of the medulla, lobula plate tangential cells (LPTCs) integ- 
rate local motion signals to produce wide-field motion response’”"’. 
However, up until now, the lack of a medulla connectome has frustrated 
investigations of local motion detection. 


Semi-automated connectome reconstruction 


To provide a reliable foundation for computational modelling and 
identify targets for electro/optophysiological recordings, we attempted 
a complete, dense reconstruction of the chemical synaptic connectivity 
within the medulla using electron microscopy, the gold standard of 


neuroanatomy™. Given the time-consuming nature of such recon- 
structions, we wanted to determine the smallest medulla volume, 
reconstruction of which would allow us to identify a circuit underlying 
the computation of local motion. Both directional turning responses’ 
and electrophysiological responses in LPTCs’ can be elicited in flies by 
sequential stimulation of two photoreceptors corresponding to adja- 
cent points anywhere in the visual field’’. This suggests that some 
repeating component of the motion detecting circuit must be present 
within any two adjacent medulla columns. We therefore decided to 
reconstruct all the synaptic connections among neurons within a single 
reference column, as well as all the connections between the reference 
column and neurons within six nearest-neighbour columns (Fig. 1d). 

Because manual reconstruction of even a seven-column volume would 
be prohibitively time-consuming’, we developed a semi-automated 
reconstruction pipeline’? and applied it to the medulla volume (Fig. 2, 
Methods and Supplementary Data 1), reconstructing 379 cells (Sup- 
plementary Fig. 1 and Supplementary Video 1). 

To map our reconstruction onto the existing body of knowledge, we 
assigned these cells to previously proposed cell types”® by comparing 
the shapes of reconstructed arbors (Supplementary Fig. 1) with those 
reported from light microscopy using Golgi impregnation or genetic 
single-cell labelling (Fig. 2e, fand Supplementary Methods). Because 
there were several reconstructed examples for almost all neuronal 
types (Supplementary Fig. 1 and Supplementary Table 2), it was pos- 
sible to characterize the common structural features of each type. In 
many cases, this allowed us to match unequivocally a reconstructed 
cell with a Golgi impregnate”, for which it was then named (Sup- 
plementary Methods). However, there was also a subset of cell types 
for which a Golgi counterpart could not be found but which we 
validated using isomorphs from genetic single-cell labelling. We 
named these cell types Mil3, Mil4, Mil5, TmY14, Dm9, and Dm10 
(Supplementary Fig. 2). In total, from the collection of 379 recon- 
structed cells (Supplementary Fig. 1) we were able to classify 290 of 
them into 56 cell types (Supplementary Table 2). 
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Figure 1 | Motion detection and the Drosophila visual system. a, Rightward 
motion component of the Hassenstein—Reichardt EMD* model. Light input 
(lightning bolt) into the left channel (magenta) is transmitted with an 
additional delay, t, relative to that into the right channel (cyan). For a 
rightwards-moving object, signals from both channels will arrive at the 
multiplication unit closer in time to each other, and therefore become 
nonlinearly enhanced (and vice versa for leftwards-moving objects). As a result, 
the model responds preferentially to rightward motion. b, Alternative Barlow- 
Levick-like EMD° model, also preferring rightward motion. Note that the 


Figure 2 | Connectome reconstruction using serial-section electron 
microscopy. a, A representative micrograph, one of 2,769 from the electron 
microscopy series. b, Proofread segmentation of the micrograph in a into 
neurite profiles (single colours). c, Synapses comprise a presynaptic process 
containing a T-bar ribbon (red arrow) and associated neurites with 
postsynaptic densities (PSDs) (blue arrowheads) adjacent to the T-bar. A non- 
synaptic process (green circle) lacks a PSD (in both this and other section planes 
containing this T-bar). d, Neurites are reconstructed by linking profiles in 
consecutive sections (left), to construct a 3D object (right). e, An example of a 
neuron reconstructed from electron microscopy (left), identified by 
comparison with the Golgi impregnated cell (centre)”” as type Mil and cross- 
validated by a corresponding genetic single-cell (GSC)-labelled neuron (right) 
(Supplementary Methods). f, Same as e for cell type Tm3. Scale bars, 500 nm 
(a, b), 250 nm (c) and 10 um (e, f). 
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inputs are combined with opposing signs and the delay is now in the right 
(cyan) channel. c, Bodian silver-stained horizontal section* of the Drosophila 
melanogaster visual system revealing the four neuropils of the optic lobe. The 
medulla region of interest (solid rectangle, expanded in d) and the wider 
imaged volume (dashed rectangle) used to trace into the lobula plate are shown 
schematically. d, The 37 um X 37 um medulla region of interest is centred on 
the reference column (red) and six surrounding nearest-neighbour columns 
(blue). The medulla has ten strata (M1-M10) defined by the arborizations of its 
cell types. Scale bars, 50 1m (c) and 10 um (d, in all three directions). 


To reveal the connections between the 379 reconstructed neurons, 
we identified pre- and postsynaptic sites and assigned them to their 
respective parent cells. Within the reference column and its imme- 
diate surround, we annotated 10,093 presynaptic sites and 38,465 
associated postsynaptic sites (3.8 + 1.2 (mean = s.d.) per presynaptic 
site) (Fig. 2c and Supplementary Table 1). Although presynaptic 
T-bars typically fell onto proofread profiles of neurons, postsynaptic 
sites usually fell onto isolated profiles, unassigned to any neuron. 
Thus, it was necessary to trace the dendrite containing each post- 
synaptic site back to a parent cell. This postsynaptic tracing was 
extremely challenging as Drosophila neuron dendrites branch elabo- 
rately and, indeed, can be thinner than the section thickness. 

The challenging postsynaptic tracing led to (1) some erroneously 
identified synaptic contacts, and (2) a high fraction (~50%) of con- 
tacts that could not be traced to their parent neuron and were therefore 
unidentified. To increase our confidence in the identified contacts (1), 
we had two proofreaders trace every postsynaptic site (Methods), and 
only accepted into the connectome those contacts that both proof- 
readers identified independently. By contrast, it was not possible 
to reduce the number of unidentified contacts experimentally (2). 
However, we were still able to construct a connectome valuable for 
inferring function because we found that, within the medulla, connec- 
tions of high weight (that is, high number of synaptic contacts per 
connection) both capture a large fraction of the total connection weight 
and can be identified with high fidelity. Indeed, the distribution of 
connection weight in our connectome is heavy tailed (Fig. 3b, inset, 
and Supplementary Fig. 3), as has been found in other organisms*’”’. 
Also, assuming that synapses are equally difficult to proofread, we 
found that any strong connection (with >5 synaptic contacts) will 
be identified with >95% probability. Therefore, in the resulting con- 
nectome, 8,637 synaptic contacts are precisely identified, and all strong 
connections are represented. 


The connectome module and its pathways 

To identify pathways performing local computations such as motion 
detection, it was necessary to generate a more convenient abstraction 
of the full connectome. Because we expect that the circuits of interest 
repeat within each column, we extracted from the medulla connec- 
tome a periodic module of connections between identified cell types 
that arborize in every column. These include both so-called synper- 
iodic cell types with single neurons in every column of the medulla”, 
and cell types with several members within each column, which we 
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Figure 3 | Medulla connectome module. a, Synaptic connectivity matrix for 
modular cell types assembled from 2,495 synapses (Supplementary Table 1). 
Three pathways, identified via the Louvain clustering analysis”, are labelled by 
coloured boxes. They are named by their primary input neuron(s): the L1 
(magenta), L2 (green) and L3/R7/R8 (cyan) pathways. The pathways are 
ordered by the total number of connections within a pathway, in descending 
order, and the cell types, within each pathway, are ordered by the sum of their 
pre- and postsynaptic connections to and from other cell types within their 


term ultraperiodic. We do not include infraperiodic tangential or local 
amacrine-like cells even if they have arborizations in every column 
because this cannot be determined unambiguously from our electron 
microscopy reconstruction (Supplementary Table 2). We used the 
existence of multiple representatives from adjacent columns within 
our electron microscopy reconstruction (Supplementary Fig. 1) to 
identify 25 cell types as synperiodic, as well as two cell types, T'm3 
and T4, as ultraperiodic (with 1.5 and 4 cells per column, respectively) 
(Supplementary Table 2). We termed these 27 cell types modular. 

Assuming that connections between modular cells are stereotypical 
between columns, we constructed the repeating circuit module by 
finding all connections between these cell types (Methods). Unlike 
sparse reconstructions, the resulting connectome module (Fig. 3a) 
accurately captures not only the presence but also the absence of 
strong connections between any two cell types. 

To determine which neurons could be involved in different local 
computations, we dissected the connectome module into three sepa- 
rate signal processing pathways, using both a clustering and a layout 
algorithm (Fig. 3). We recognized them as the previously identified 
pathways, those of L1, L2 and L3/R7/R8. The downstream targets of 
R7 and R8 have previously been implicated in colour vision. Because 
colour pathways are separated into differing columns receiving inputs 
from either pale or yellow ommatidia™, we expect that they should 
rely on infraperiodic cell types omitted from our connectome module. 
Therefore, the fine structure of the L3/R7/R8 pathway will be revisited 
elsewhere. 

The remaining L1 and L2 pathways signal visual contrast, and are 
implicated in motion detection’? *’. Behavioural experiments and 
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pathway, also in descending order. b, Medulla connectome module as a 3D 
graph. Cell types with stronger connections are positioned closer to each other, 
using the visualization of similarities layout algorithm’. Three spatially 
segregated groups are observed that closely match the pathways identified 
through clustering (colouring of spheres). The dominant direction of signal 
flow is oriented into the page’’. Inset in b shows the fraction of synaptic 
connections within the full connectome having a connection weight greater 
than indicated. 


electrophysiological recordings confirm this role for L1 and L2: not 
only is each necessary for aspects of motion detection” *’, but, among 
the cells postsynaptic to the photoreceptors R1-R6, both are also 
mostly sufficient*®*’ for the computation. However, L1 and L2 them- 
selves lack directionally selective responses’”’. Therefore, to search for 
motion detection circuit(s) within the connectome module, we exam- 
ined the neurons downstream of L1 and L2 in more detail. 


Candidate motion detection circuit 


Several lines of evidence indicate that motion information computed 
downstream of L1 and L2 is relayed to the lobula plate via cell types 
T4 and T5 (ref. 25). First, recordings from LPTCs in fruitflies with 
genetically silenced T4 and T5 demonstrate that at least one of these 
columnar cell types is necessary to detect direction selectivity’. Second, 
T4 and T5 cells in Drosophila each comprise four subtypes differen- 
tiated by the lobula plate layer in which their axons arborize”’. Third, 
each of these four layers within the lobula plate exhibits activity in 
response to wide-field stimuli moving in a particular direction: down- 
wards, upwards, backwards and forwards (Fig. 4b, e), revealed by their 
uptake of deoxyglucose’'”’. Finally, dendrites of LPTCs with different 
motion preference co-occupy the lobula plate layers corresponding to 
their directional preference”’ and, in addition, receive direct synaptic 
connections from T4 terminals**. Collectively, these data suggest that 
each subtype of T4/T5 forms the output of motion detection circuits 
signalling a particular direction of motion. 

Next, we argue that the direction-selective outputs of T5 and espe- 
cially T4 are computed largely independently of each other. Consistent 
with stratum-overlap analysis in Drosophila”® and large flies*, our 
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Figure 4 | Spatial displacement of Mil- and Tm3-mediated inputs onto a 
single T4 cell (T4-12). a, Bottom view of dendrites of the Mil (cyan) and Tm3 
(magenta) neurons presynaptic to T4-12, overlaid on the array of L1 axonal 
terminals (yellow). The colour saturation for each dendritic arbor reflects the 
number of synaptic contacts made onto T4-12 (see b and d). Arrow shows the 
displacement from the Tm3 centre of mass to the Mil centre of mass computed 
as illustrated in e. b, Side view of T4-12 and its presynaptic Mil and Tm3 cells. 
Direction preference for a T4 cell (coloured to match the directional 
preferences in e) is determined by the lobula plate arborization layer of the axon 
terminals (T-bars in black). c, Enlargement (dashed rectangle in a) showing 
reconstructed neurites of Mil, Tm3 and L1 cells (without the weighted colours 
in a), and their synaptic contacts (L1 — Mil: blue; L1 — Tm3: red). 

d, Reconstructed dendritic arbor of T4-12 with synapses from Mil (blue) and 
Tm3 (red). e, Cartoon of inputs to a single T4 cell through Mil and Tm3 cells. 
Mock synaptic weights illustrate how the receptive fields were computed. The 
centre of mass of Mil (or Tm3) component, blue (or red) circle, is computed by 
placing the mass corresponding to the compound synaptic weight from L1 
through Mil (or Tm3) to T4 at the centre of the corresponding column. Scale 
bars, 8 um (a), 8 pm (b), 1 um (c) and 4 um (d). 


connectome (Fig. 3a, b) indicates that T4 belongs to the L1 pathway. 
Conversely, although we did not trace into the lobula, the dendrites of 
T5 cells co-occupy layers in the lobula with the axon terminals of 
neurons such as Tm1 (refs 20, 25), Tm2 (ref. 20) and Tm4 (ref. 20), 
which belong to the L2 pathway (Fig. 3b). Electrophysiological*® and 
behavioural” evidence indicate that the L1 and L2 pathways in the 
connectome module (Fig. 3a, b) are computationally independent 
and correspond to ON and OFF pathways in the visual systems of 
vertebrates*. Furthermore, most of the connections from the L2 to 
the L1 pathway arrive via L5 (Fig. 3a), a cell type not implicated in 
motion detection*’. Hence, we decided to search for a motion detection 
circuit downstream of L1 converging on T4 cells. This decision was 
made despite electrophysiological evidence showing a lack of direction 
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selectivity in T4 cells**, but since the completion of our work, has been 
supported by calcium imaging of T4 cells”. 

To identify candidate elements of a motion detection circuit bridg- 
ing between L1 and T4, we took advantage of the fact that motion 
detection is both fast and robust to noise’®”’, and consequently should 
be implemented by both feedforward and strong connections. Our 
dense electron microscopy reconstruction identified five cell types with 
significant input from L1: Mil, Tm3, L5, C2 and C3 (Fig. 3a). Cell types 
Mil and Tm3 are the two largest recipients of L1 input, together 
accounting for more than half of the synaptic contacts of L1. In turn, 
Mil and Tm3 together contribute >80% of presynaptic inputs to T4 
(including all inputs from both modular and non-modular cell types), 
thus forming the two strongest paths from L1 to T4. By contrast, cell 
type C3 contacts T4 with an order of magnitude fewer synapses than 
Mil and Tm3, suggesting that its contribution is far weaker. Finally, 
cell types L5 and C2 have no synapses directly with T4. These features 
lead us to suggest that Mil and'Tm3 are the primary substrates for robust 
and fast motion detection within the L1 pathway. 


Anatomical receptive fields of T4 cells 


To explore further whether Mil and Tm3 cells converging on T4 cells 
could constitute the two arms of a correlation-based motion detector 
(Fig. 1a, b), we examined whether the motion axis defined by these 
inputs onto a particular T4 is consistent with its preferred direction, as 
measured by its outputs. This output-preferred direction was deter- 
mined for 16 T4 neurons by tracing their axons into the lobula plate 
and identifying their arborization layer’'”’ (Fig. 4b, e). 

To compare the preference of T4 output with the motion axis 
arising from its inputs, we constructed the input motion axis by 
analysing the numbers of contacts from individual Mil and Tm3 cells 
onto the T4 neuron in question. We found that each T4 receives 
inputs from several Mil and Tm3 neurons suggesting that, unlike 
the circuits in Fig. 1a and b, several points in the visual field provide 
inputs into each arm of the motion detector (Fig. 4a, b, e). This 
observation is supported by the structure of sampling units inferred 
from recordings in blowfly H1 (ref. 36), which receives inputs from 
LPTCs*’. We therefore needed to characterize the inputs to each T4 as 
Mil- and Tm3-mediated receptive field components, mapped into 
the visual field. To do this, we traced synaptic connections from L1 
terminals in 19 columns (the reference column and surrounding 18 
columns) to the downstream Mil and Tm3 cells and then from the 
Mil and Tm3 neurons onto T4 cells that receive input from the 
reference column (which also happen to number 19). The resulting 
receptive fields (Fig. 4a and Supplementary Fig. 4) show the T4 inputs 
mapped as if on the L1 array and hence into the visual field. 

For all T4 cells, the Mil- and Tm3-mediated components of the T4 
receptive field overlap substantially with one another (Supplementary 
Fig. 4). Indeed, the centres of mass of the two components are dis- 
placed less than one inter-ommatidial distance (Fig. 5a). However, for 
15 of the T4 cells, this displacement is still significantly greater (P < 
0.05) than would have been obtained by chance from tracing errors. 
Such a small displacement magnitude relative to the widths of the 
receptive fields agrees with previous evidence inferred from blowfly 
H1 recordings and has been justified theoretically**. 

Is the direction of displacement between the Tm3 and Mil receptive- 
field components for a T4 neuron consistent with the directional pre- 
ference of the neuron, as defined by the depth of its terminal axonal 
arborization in the lobula plate? Assuming that the direction of the 
displacement is determined from the Tm3 to Mil component centres 
of mass, Fig. 5b (top) shows that the direction of displacement agrees 
with the directional preference for three of the four lobula plate layers. 
The discrepancy in the direction of displacement and the front-to-back 
motion preference of T4 cells terminating in the fourth layer (lobula 
plate later 4, LP4) may be caused by neglected circuits contributing 
specifically to the responses of these T4 neurons. For example, behavioural 
evidence implicates C3 neurons in front-to-back motion detection”’, 
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Figure 5 | Computed displacements for all T4 
cells. a, Displacement vectors for each T4 neuron 
(n = 19). Neurons with significant displacement 
(names in bold) have 95% confidence intervals 
(ellipses) that exclude the origin (Methods). The 
vectors are in the ommatidial frame of reference 
(within ~30° of the visual axes). b, Top, mean 
displacement, computed from a, averaged over the 
cells with the same preferred direction of their 
output. Bottom, the angular difference between the 
spatial displacement for individual T4 neurons and 
the preferred direction of its output (for lobula 
plate layers 1-3 (LP1-3)) correlates with the 
fraction of missing L1 inputs (Methods). T4 cells 
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and, indeed, preliminary tracing results indicate that, despite C3 cells 
providing an order of magnitude fewer inputs to T4 cells than Mil or 
Tm3 cells, C3 neurons target LP4 T4 cells preferentially. 

Some of the discrepancy between the receptive field offset and the 
directional preference of individual T4 cells (Fig. 5a) innervating 
lobula plate layers 1-3 is probably caused by a systematic error in 
our reconstruction resulting from the finite size of the reconstructed 
region. Indeed, the discrepancy between the measured input displace- 
ment for a given T4 and its predicted direction preference correlates 
with the weighted fraction of missing L1 inputs onto Tm3 cells 
upstream of that T4 (Fig. 5b, bottom), supporting the view that the 
fields of peripheral T4 cells may not be fully reconstructed. In addi- 
tion, some of the remaining variation in the offset orientation may 
also be real, given the observed 60°-90° half-width of the tuning 
curves obtained through calcium imaging of individual T4 subtypes”. 

Our choice to measure displacement from Tm3 to Mil (and not in 
the reverse order) seems arbitrary without including information 
about delays and synaptic polarity (Fig. la, b). To estimate a possible 
conduction delay, we measured both the path length and the calibre of 
the main axon trunks that conduct signals along the Mil and Tm3 
cells from L1 synapses to T4 synapses and found them to be similar, 
within 10% of each other. Moreover, using a range of electrotonic 
parameters measured in other fly neurons”, the corresponding cable 
delays were still only on the order of one millisecond, an order of mag- 
nitude less than that required for motion detection’’. Furthermore, 
although some neurotransmitters have been identified for the cell types 
involved, we do not know their associated receptors and hence the 
resulting synaptic polarity. 
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Figure 6 | Orientation of medulla dendritic arbors of T4 neurons correlates 
with axon terminal arborization layer in the lobula plate. a, Four 
representative medulla dendritic arbors of T4 cells. The colours represent local 
dendritic branch orientation. The colour map was constructed by assigning 
colours from each lobula plate layer (Fig. 4e) to the average dominant branch 
orientation over all neurons in each layer (arrows within colour map) and 
smoothly interpolating. b, Depth of axonal arbor of a T4 neuron within the 
lobula plate correlates with dominant dendritic branch orientation in the 
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with >15% of missing L1 inputs were omitted from 
the mean displacements (top). Scale bars, 0.5 

(a) and 0.2 (b) of the centre-to-centre distance 
between adjacent facets. 
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In the absence of evidence for both relative delay and synapse pola- 
rity, we are free to choose the measurement direction from Tm3 to Mil, 
which leads to spatial displacements consistent with the directional 
preference predicted by the depth of the T4 terminal in the lobula plate. 
Assuming that the Mil and Tm3 inputs to T4 are combined with the 
same sign, as in the Hassenstein—Reichardt elementary motion 
detector (EMD) model’ (Fig. 1a), we predict that the Tm3 arm of the 
motion detector should introduce a longer delay than the Mil arm. If, 
however, the inputs were combined with opposing signs, as in the 
Barlow-Levick-like EMD model’ (Fig. 1b), then our prediction would 
be the opposite. Regarding the mechanism of the delay, having the two 
arms of the circuit implemented by different cell types allows the 
possibility that the delay may be implemented biologically by means 
of metabotropic receptors, as reported in the vertebrate retina”’. 

Exploring the reconstructed T4 cells, we identified a hitherto unre- 
cognized feature of their medulla dendritic arbors (see, however, 
T4a,d in Fig. 14 of ref. 20, and N. J. Strausfeld, personal communica- 
tion): the dendritic branches of each T4 neuron are oriented primarily 
in one direction (Fig. 6a and Methods). Moreover, the branch ori- 
entation of each T4, measured from the dendrite tips to their bases, 
clusters around one of four directions (Fig. 6b). These four directions, 
when mapped from the medulla coordinate frame onto the visual field 
(Fig. 6c), align with the output direction preference for each lobula plate 
layer (Fig. 6b). This observation allowed us to cross-validate the clas- 
sification of each of the 16 T4 cells into direction preference subtypes 
(Fig. 6a, b and Supplementary Fig. 5). We then used this observation to 
infer a direction preference for the remaining three T4 cells, for which 
tracing into the lobula plate could not be completed (Fig. 6c). 
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medulla (Methods). In the depth axis, four layers are labelled and the neurons 
within each layer are coloured as in Fig. 4e. The dominant orientations of 
neurons with axons not traced to the lobula plate are plotted on the x axis (1: 
T4-5, 2: T4-4, 3: T4-14), and they are coloured with the colour of the cluster to 
which they most likely belong (Supplementary Fig. 5). c, Transforming the 
dominant dendritic orientation (+s.e.m.) from the space defined by the array of 
medulla columns (in layer M10) to the directions in visual space. Scale bar, 

5 um. 
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Discussion 


This report has introduced a new high-throughput, semi-automated 
pipeline for electron microscopy reconstruction, and applied it to 
reconstruct a connectome module comprehensively within the medulla, 
a neuropil that has long resisted such attempts. Furthermore, using 
connectomics, we identified Mil and Tm3 inputs to T4 neurons as the 
two arms of a candidate correlation-based motion detector. Although 
anatomy alone does not allow us to probe the nonlinear operation or 
the time delay, and hence distinguish between different correlation- 
based models**”, we were able to predict which cell type should intro- 
duce a longer delay, given their synaptic polarities (Fig. 1a, b). 

Analogous to our proposed circuit downstream of L1, the connec- 
tions within our electron microscopy reconstruction allow us to sug- 
gest candidate cell types—Tm1, Tm2 and Tm4—that may constitute 
the motion detection circuit downstream of L2. Confirmation of this 
suggestion must of course await dense reconstruction of the connec- 
tions onto T5 neurons in the lobula. 

This report has several interesting parallels with results from verte- 
brate retinae. First, the existence of the four subtypes of T4 cells 
responding to the four cardinal directions of motion is reminiscent 
of the four subtypes of ON-OFF directionally selective ganglion cells 
(DSGCs) in the rabbit retina*’. Second, our finding that directional 
selectivity of T4 cells is aligned with their dendritic orientation is 
reminiscent of JAM-B ganglion cells* and starburst amacrine cells 
(SACs)* in the mouse retina. However, unlike JAM-B and SAC cells, 
the preferred direction of T4 cells is away from the tip of the dendrites 
and, unlike SAC cells but like JAM-B cells, all dendrites in one T4 point 
in the same direction. Third, the highly specific connections between 
SACs and DSGCs responsible for the directional selectivity of the latter 
were also demonstrated previously using large-scale connectomics™. 
However, unlike the SAC-to-DSGC circuit, the circuit we report may 
compute directionally selective responses from non-directional inputs. 

Our identification of the candidate motion detection pathway down- 
stream of L1 was greatly aided by the comprehensiveness of our electron 
microscopy reconstruction. Relative to connections estimated by arbor 
overlap”®, having the precise synaptic counts allows us to unequivocally 
establish connections (Supplementary Fig. 6). In this way, we identified 
Tm3 as a primary component of the motion detection circuit, a fact that 
escaped previous researchers owing to its minimal arborization in M10. 
Furthermore, relative to sparse reconstructions in other systems, for 
example, synaptic connections between SACs and DSGCs™, the com- 
prehensiveness enables us to argue both the absence of alternative path- 
ways and the numerical importance of the proposed pathway. 

The significance of the dense medulla connectome also goes far beyond 
the local motion detector, applying to many other visual computations. 
Although much remains to be done, especially to incorporate tangen- 
tial and infraperiodic cells fully, our connectome does contain the 
columnar neurons found in every column and hence removes a long- 
standing block to understanding insect vision. More generally, our 
results illustrate that, combined with a rich collection of experimental 
and theoretical results, connectomes can, by identifying underlying 
circuits, provide key insight into neuronal computation. 


METHODS SUMMARY 


A fly’s brain was prepared for electron microscopy by high-pressure freezing/ 
freeze substitution, followed by embedding, and sectioning. Sections were imaged 
using transmission electron microscopy. Electron micrographs were assembled 
into a three-dimensional stack correctly representing the relative locations of the 
imaged structures by means of nonlinear registration. The aligned stack of grays- 
cale images was partitioned by machine learning algorithms into subsets of pixels, 
each belonging to only a single neuron (see https://bitbucket.org/shivnaga/sstem). 
This over-segmented partition was manually inspected and agglomerated into 
individual neurons (‘proofreading’). Presynaptic terminals and postsynaptic den- 
sities were then annotated by hand, and assigned to neurons (Supplementary Data 1). 
These manual steps required ~14,400 person-hours (including ~2,500 expert 
hours). The types of reconstructed neuron were determined by matching their 
shapes to examples from light microscopy. The full connectome was condensed 
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into a connectome module, which was partitioned into pathways using graph 
layout and clustering algorithms. By counting the numbers of synaptic contacts 
between L1, Mil, Tm3 and T4 cells, we computed the relative displacement of the 
input components. We cross-validated this displacement against the direction 
preference of T4 cells found by tracing their axons into the lobula plate. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 


Tissue preparation, electron microscopy and imaging. The right part of the 
brain of a wild-type Oregon R female fly was serially sectioned into 40-nm slices. 
A total of 1,769 sections, traversing the medulla and downstream neuropils 
(Fig. 1c), were imaged at a magnification of X5,000. This process is detailed in 
the Supplementary Methods. 

Semi-automated reconstruction pipeline. To obtain a dense electron micro- 
scopy reconstruction of the reference column, we used a sequence of automated 
alignment and segmentation steps, followed by manual proofreading and recon- 
struction, which we described as the semi-automated reconstruction pipeline”. 
Image alignment. We first found a rough alignment of the full image stack, 
ignoring artefacts such as folds, tears and dirt occlusions, by using TrakEM2 rigid 
registration** to align image blocks consisting of 20 sections of 9 X 9 mosaics, and 
then aligning blocks by an automated search over images at the centre of each 
mosaic. This rough alignment served to determine which images overlapped, 
allowing more precise analysis of tissues with artefacts, and in particular, large 
folds. Pixels much darker than average were assumed to correspond to folds and 
were used to divide each image into two or more connected components, called 
patches. For each pair of overlapping patches, both within a section (along the 
boundary) and section-to-section, points of correspondence were found by cor- 
relation (~1 per 500,000 overlapping pixels). A least-squares fit of these points 
with regularization for scale and skew was then used to produce a global affine 
alignment of all patches. Examination of errors from this fit identified images for 
which the automatic division into patches was inaccurate, and these divisions 
were corrected manually. Once a satisfactory fit was obtained, each patch of each 
image was then slightly distorted to provide a best match to its neighbour(s) while 
still remaining close to the global affine. More details are available in the ref. 49. 
Automatic image segmentation. In the next step, we partitioned the medulla 
region of interest within the aligned stack of greyscale images into subsets of pixels 
belonging to individual neurons. Given that the resolution of the transmission 
electron microscopy (TEM) data set is anisotropic, we developed a two-step 
process comprising 2D segmentation to identify cross-sections of neurons fol- 
lowed by linkage of these segments in 3D. No single algorithm was used on all 
data, because many different segmentation techniques were tried in parallel with 
proofreading efforts, and it was counter-productive to re-segment portions 
already corrected. A typical 2D segmentation step entailed creating boundary 
probability maps using morphological features*’** followed by boosted edge 
learning’, mitochondria detection to reduce false boundaries”, followed by 
watershed segmentation” and agglomerative clustering” using mean and median 
boundary values to create 2D segments. The 3D linkage step constructed a linkage 
graph of consecutive 2D segments in adjacent sections. Again, several techniques 
were used, including simple metrics such as overlap, and machine-learning 
approaches that computed appropriate weights of features from previously proof- 
read data. Further details of some of our automatic segmentation approaches can 
be found in previous publications’”*’. Given that all segmentation algorithms 
make mistakes resulting from imaging artefacts and low z-resolution, and because 
manual correction of over-segmentations is easier than under-segmentations, we 
tuned our automatic algorithms to produce an over-segmented image volume. 
Furthermore, we preserved watershed regions called super-pixels to facilitate the 
manual correction of over-segmentation in the next step. We have released the 
latest (and we believe best) version of the segmentation code that we used (https:// 
bitbucket.org/shivnaga/sstem), but caution the reader that even our best auto- 
matic segmentation required extensive manual proofreading, correction, and 
annotation (see below) to yield the results we report in this paper. 
Proofreading and reconstruction. We next inspected the results of automatic 
segmentation, corrected remaining errors, and assigned synapses to the proofread 
cell arbors. Because this was time-consuming, we trained a group of professional 
editors, referred to as proofreaders, whose work was supervised by three experi- 
enced electron microscopists (Sh.T., Sa.T. and P.K.R.) (experts). Proofreaders and 
experts performed their tasks using a dedicated custom software tool, Raveler 
(D.J.O. et al., manuscript in preparation). In total, these proofreading steps took 
~12,940 person-hours (including 900 person-hours contributed by our experts). 
There were five key steps within the reconstruction procedure: (1) volume proof- 
reading, (2) synapse annotation, (3) postsynaptic tracing, (4) anchor body refine- 
ment, and (5) selective sparse tracing, detailed in the Supplementary Methods. 
Reliability of the wiring diagram. As introduced above, we assigned two proof- 
readers to each synapse, to increase the reliability of proofreading. Characterizing 
this, in 48.2% of cases both proofreaders were unable to identify a parent cell 
either because they could not trace the process confidently or because it left the 
medulla region of interest. In 44.0% of cases, both proofreaders traced the PSD to 
the same anchor body, and in 7.5% of cases one proofreader was unable to 
complete the tracing whereas the other traced the PSD to an anchor body (num- 
bers extracted from Supplementary Table 1). However, only in very few cases 


(0.23%) did the two proofreaders reach different anchor bodies. These numbers 
suggest that a large fraction of connections will be missed, following the two 
proofreader agreement process; however, all connections that are identified have 
a very high probability of being correct. 

To assess our reconstruction quality further, we generated two connectomes 
from the dual proofreader results—an inclusive version that included connec- 
tions found by either proofreader, and a consensus version in which connections 
were accepted only when both proofreaders agreed. Comparing these two con- 
nectomes was generally reassuring. Although the inclusive connectome has 
~16% more connections, all the additional connections had only one synaptic 
contact. All connections with two or more synapses are present in both connec- 
tomes (Supplementary Table 1). We used the consensus connectome for all our 
analyses. However, the conclusions remain unchanged when using the inclusive 
connectome. 

In general, the high rates of missed synaptic contacts in our proofreading was 
tolerable for our project because our intent was to study connections with several, 
parallel synaptic contacts. We could confirm that the connectome contains a large 
fraction of such strong connections by plotting the distribution of the number of 
contacts between connected pairs of cells. 

We found a strongly heavy-tailed distribution of the total numbers of contacts 

for each connected pair both in the whole connectome (inset in Fig. 3b), and 
within the subset of sparsely-traced cell-types involved in motion detection 
(Supplementary Fig. 3). Given that the sizes of T-bars within the medulla are 
relatively uniform (A. McGregor et al., unpublished observations) and the size of 
synaptic structures is thought to correlate with their physiological strength”, we 
viewed the number of parallel synaptic contacts between two neurons as a proxy 
of synaptic weight. Furthermore, making an assumption that the probability of 
missing a synaptic contact during proofreading is uniform across all postsynaptic 
sites, we can estimate that the consensus connectome contains all connections 
with >5 synapses with a confidence level >95% (Supplementary Methods). 
Therefore, we believe our consensus connectome is both precise and compre- 
hensive for strong connections. 
Constructing a connectome module. In constructing a repeating module within 
the medulla connectome, all the members of each class of neurons within the 
reference column were identified, and the number of synapses from those neu- 
rons to all other neurons of the postsynaptic class was averaged over the pre- 
synaptic cells. For the ultraperiodic cell types, for example, Tm3, this could result 
in a fractional weight. Furthermore, because there are 1.5 Tm3 cells per column 
on average, we computed the synaptic weight by multiplying it by 1.5. These 
fractional weights provide the mean connection strength over different columns, 
because some columns have only a single Tm3, whereas others have two. 

The directional summation that was used here was chosen because, in our 
reconstruction, we attempted to proofread each postsynaptic element to its assoc- 
iated neuron. However, we did not attempt to proofread every presynaptic site 
back to its parent neuron (as some such elements might derive from non-reference 
columns, which were not densely reconstructed). 

This method was modified for the four connections in the motion detection 
circuit: L1 to Mil, L1 to Tm3, Mil to T4, and Tm3 to T4. In these cases, the 
number of synaptic contacts from the densely reconstructed medulla connectome 
was replaced by the number of synaptic contacts identified during sparse tracing 
of these specific connections (Supplementary Table 3). 

Computing the Mil and Tm3 receptive field components. We computed the 
Mil and Tm3 components of the receptive field for each T4, by multiplying the 
number of synaptic contacts from each L1 neuron to a single intermediate Mil or 
Tm3 neuron by the number of contacts from that intermediate cell to the T4, and 
then summing over all the Mil and Tm3 neurons that receive input from the same 
L1 (Fig. 4e). This multiplication is equivalent to counting the number of inde- 
pendent synaptic routes from each L1 to each T4, in which each route must use a 
different pair of the synaptic contacts between the L1 and the intermediate target 
cells, and the intermediate targets and the T4. 

Monte Carlo error estimate. Our proofreading methodology results in very few 
false positive errors, but many false negative errors between neuron pairs (see 
earlier). Therefore, it is highly probable that the observed number of synaptic 
contacts (m) is a subset of a higher, true number of synaptic contacts between two 
neurons (n). Assuming that, for any connected pair of neurons each synaptic 
contact had an equal false negative probability, and using that probability in a 
binomial distribution, our goal was to estimate the posterior probability, P(n|m) 
(that is, the probability of the true n given the observed m), for different values of 
n. To do this, because the previous distribution over n is unknown, we approxi- 
mated this posterior probability by the likelihood, P(m|n) (that is, probability of 
observing m given some value of the true n), for different values of n. We gen- 
erated 1,000 different estimates of the true connection weight, for every connected 
neuron pair, by sampling from the set of possible n values with the computed 
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likelihood given each value of n. Neuron pairs that were disconnected because of 
high false negative error rates were also allowed to connect if their neurite profiles 
overlapped in 3D. The number of contacts was estimated using the same sampling 
method as before (but with m equal to zero). In this way, we generated 1,000 
different connectivity matrices. The displacement was then computed for each 
matrix. The eigenvalues and eigenvectors of the covariance matrix over all dis- 
placement vectors were computed, and used to plot an ellipse along the eigen- 
vector axes with 2¢ confidence intervals (Fig. 5a). To extend this construction to 
the mean displacement over a group of neurons, 1,000 variations of the mean were 
computed, by first sampling from the displacements generated for each neuron, 
and then computing the mean for each set of samples. The ellipses were then com- 
puted, in the same way as before, over this set of 1,000 mean values (Fig. 5b, top). 
Effect of limited reconstruction size. Because the spread of the dendritic arbors 
of the Tm3 cells connected to a given T4 is greater than that of the Mil cells, Tm3 
cells are more likely to be partially reconstructed, and, hence, missing L1 inputs 
near the edges of our 19 column reconstruction. To provide a measure of the 
effect of this cut-off on each T4, we first visually inspected each Tm3 neuron, and 
classified it into one of four classes, depending on the percentage of the arbor that 
was missing from the 19 column reconstruction. Tm3 cells that were recon- 
structed fully received input from an average of six L1 neurons. In contrast, Tm3 
cells in the fourth class, which had most of their arbor outside our region of 
interest, received input from an average of only two L1 neurons. By summing 
up the fraction of L1 inputs missing, weighted by the fraction of inputs provided 
by each Tm3 class to the given T4, we obtained an estimate of the weighted 
fraction of L1 inputs missing (through the Tm3 channel) for each T4 (the x axis 
in Fig. 5b, bottom). We also used this metric to justify the removal of T4 cells 
missing >15% of their weighted L1 inputs (through the Tm3 channel), in con- 
structing the mean responses for T4 cells with the same output direction prefer- 
ences (Fig. 5b, top). 

Dendrite orientation. After proofreading, the T4 arborizations within M10 were 
skeletonized, starting from the centre of the thickest branch, adapting the method 
of ref. 59 with a modified weighting function. The axonal branch point was 
determined by Sh.T. so as to identify the dendritic part of the medulla arboriza- 
tion precisely. The local orientation around each dendritic node was computed 
(using three nodes in both directions around each node, and ignoring any nodes 
with fewer than three adjacent nodes in both directions). The dominant dendritic 
branch orientation, for each T4 (Fig. 6b), was computed by taking the fast fourier 
transform (MATLAB and Statistics Toolbox, version R2012b) of the distribution 
of orientations across nodes, and defined to be the phase of the fundamental 
mode component of the transformation. Assuming a normal distribution for the 
dominant orientation of the cells assigned to a layer (via their axon arbor depth), 
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we computed the probability of each untraced T4 lying within each cluster of cells, 
given its measured dominant orientation. The assignment of cell T4-4 to layer 1 
was significant (P < 0.05), but the assignment of the other two cells was not. The 
colour map for the dendritic arbors (Fig. 6a and Supplementary Fig. 5) was chosen 
by centring colours on the average of the dendritic branch orientation for each 
cluster (Fig. 6b), and varying the colour continuously between clusters. 

Data access. At publication the skeletons of all cells will be uploaded to http:// 
neuromorpho.org. We will host a website (https://openwiki janelia.org/wiki/display/ 
flyem/Medulla+TEM + Reconstruction) that will allow the viewer to search the 
consensus connectome for connected neuron pairs. We will also provide the entire 
segmentation with synapse annotations, along with Raveler to open the data set, 
upon request. The requesting party will need to supply a hard drive. 
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Saturn’s moon Enceladus emits a plume of water vapour and micro- 
metre-sized ice particles from a series of warm fissures located near 
its south pole’"°. This geological activity could be powered or con- 
trolled by variations in the tidal stresses experienced by Enceladus as 
it moves around its slightly eccentric orbit. The specific mechanisms 
by which these varying stresses are converted into heat, however, are 
still being debated"’"'*. Furthermore, it has proved difficult to find a 
clear correlation between the predicted tidal forces and measured 
temporal variations in the plume’s gas content’”"’’ or the particle 
flux from individual sources”*'. Here we report that the plume’s 
horizontally integrated brightness is several times greater when 
Enceladus is near the point in its eccentric orbit where it is furthest 
from Saturn (apocentre) than it is when near the point of closest 
approach to the planet (pericentre). More material therefore seems 
to be escaping from beneath Enceladus’ surface at times when geo- 
physical models predict its fissures should be under tension’”’>"* 
and therefore may be wider open. 

This analysis focuses on 252 images of the Enceladus plume at 
wavelengths of 0.88-1.56m obtained by the Cassini spacecraft’s 
Visual and Infrared Mapping Spectrometer (VIMS)” between 2005 
and 2012 (see Supplementary Information). Although these VIMS 
observations could not resolve individual jets and sources, they all 
had sufficient resolution and signal-to-noise ratio to detect the plume 
as a whole (see Fig. 1). The position of Enceladus along its orbit during 
each of these observations is given by the ‘orbital phase’, f: that is, the 
difference between the moon’s orbital longitude and the longitude of 
its pericentre (also known as the moon’s true anomaly). For the data 
considered here, f varies between —40° and + 200°. Hence the obser- 
vations sample times when Enceladus was near the pericentre (f~ 0°) 
and near the apocentre (f~ 180°) of its eccentric orbit, and span a 
broad range of tidal stress states. 

Measurements made at different orbital phases and different times 
can only be sensibly compared to one another if we also account for 
variations in the viewing geometry, especially the phase angle « (that is, 
the angle between the light rays incident on the plume and the scat- 
tered light rays that reach the camera). The micrometre-sized plume 
particles are most efficient at scattering light at large phase angles®, so 
the plume will appear brighter when viewed at higher phase angles. 
Fortunately, the VIMS observations covered a range of phase angles 
when Enceladus was near the pericentre and the apocentre of its 
eccentric orbit, so we can control and compensate for these brightness 
variations due to changes in the viewing geometry. For example, the 
data shown in Fig. 1 compare measurements made at two different 
orbital phases for two different phase angles. At both phase angles the 
plume is brighter in the observation obtained when Enceladus was 
near the apocentre of its orbit, which strongly suggests that tidal forces 
do play an important role in controlling Enceladus’ activity. 


Owing to variations in the distance between Cassini and Enceladus, 
different observations sample the plume’s brightness at different alti- 
tudes. Hence, in order to derive comparable quantitative estimates of 
the plume’s brightness, we interpolate the brightness data from each 
observation to a common altitude. Fortunately, the plume’s brightness 
decreases with altitude in a regular manner. Let us define the plume’s 
‘equivalent width’ (EW) at a given altitude z above Enceladus’ south 
pole as the total integrated brightness in a horizontal slice through the 
plume (that is, a fixed z) after removing any background signal from 
Saturn’s E ring (see Supplementary Information). For low-optical- 
depth systems like the plume, this quantity is insensitive to both the 
image resolution and the alignment of the fissures, which facilitates 
comparisons between different observations. Furthermore, as shown 
in Fig. 2, EW is a nearly linear function of the parameter Z (= 
[z/(rp + |", where rz = 250km is the radius of Enceladus). This 
trend is not only a useful empirical fit to the data, it can also be 
physically justified on the basis of considerations of the velocity dis- 
tribution of the plume particles’. For the observable parts of the plume, 
we can assume that the particle and gas density are so low that the 
particles follow purely ballistic trajectories and that Enceladus’ gravity 
is by far the dominant force acting on the particles. In this situation, an 
individual particle launched from the surface at a velocity v, that is less 
than Enceladus’ escape speed, Veg- = 240m s_', will reach an altitude 
Zmax before it falls back to the surface. The velocity of the particle passes 
through zero when it reaches Zax, and energy conservation requires 
that V5 = VesclZmnax! (Te + Zmax)]"””. Hence a particle launched at speed v, 
spends the most time near Z = v,/V-.- and makes the largest contribution 
to the plume’s brightness at that location. Hence the steady decrease in the 
plume’s brightness as a function of Z implies that fewer particles are 
launched at higher velocities, consistent with previous analyses’. 

Fitting the plume’s EW profile in each image to a linear function of Z 
for altitudes between 50 and 450 km (that is, Z between 0.4 and 0.8), we 
may estimate EWgs, the plume’s equivalent width at an altitude of 
85 km (that is, Z = 0.5) for each observation. Figure 3 plots these esti- 
mates of the plume’s brightness as a function of the observed phase 
angle. Note that at any given phase angle, observations taken when 
Enceladus was near its orbital apocentre are systematically brighter than 
those taken when Enceladus was near its orbital pericentre. (This trend 
persists even if we control for other geometric parameters like the sub- 
spacecraft latitude or longitude.) Furthermore, the data obtained when 
Enceladus was close to either its pericentre or its apocentre appear to 
follow a simple empirical phase function P(«) where the brightness is a 
power-law function of the scattering angle 0 = 180° — « with an index 
of around 2.5 (that is, P(x) «x 07°, see Fig. 3). We may therefore define 
a ‘corrected equivalent width’, CEW = EWg; X [0/20°]*°. So long as 
the plume’s phase function is approximately proportional to 0” ~”, then 
these corrected widths should be nearly independent of phase angle and 
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Figure 1 | Sample VIMS observations of Enceladus and its plume. All the 
images are made at wavelengths of 0.88-1.56 jim, have been rotated so that 
Enceladus’ north pole points straight up, and have been projected onto a 
common spatial scale. In each panel, Enceladus appears as a dark disk 
silhouetted against the E-ring. The bright crescents at the upper right of these 
disks correspond to the illuminated part of the moon, and the plume appears asa 
diffuse bright streak below the moon’s south pole (compare with Supplementary 
Fig. 2). Black regions correspond to regions not covered by the original 
observation. The observation name is given at the top of each panel, with the 
relevant observation date and geometric parameters (see Supplementary 
Information). Images a and b were obtained at a phase angle («) of 150°, whereas 
images cand d were acquired at « ~ 163.5°. Images taken at the same phase angle 
use the same stretch, but the images taken at a phase angle of 150° have a 
different stretch from the ones obtained at around 163.5° phase. Note that the 
plume is significantly brighter in a and c (where Enceladus was near its orbital 
apocentre), than it is in b and d (where the moon was near its orbital pericentre). 
Above each panel are given: date (year-day); range (distance between the 
spacecraft and Enceladus), sub-spacecraft latitude (SSC lat.); sub-spacecraft 
longitude (SSC long.); phase angle a; and orbital phase, f- 


correspond to the plume brightnesses VIMS would have measured if it 
had always observed the plume at a phase angle of 160°. 

Figure 4 shows a plot of the resulting corrected equivalent width 
estimates as a function of Enceladus’ orbital phase. Note that the data 
from different phase angles follow the same trend, indicating that our 
approximate phase function is reasonably successful at correcting all 
these data. This plot confirms that the plume is indeed substantially 
brighter when Enceladus is at apocentre than when it is at pericentre. 
In fact, the plume’s integrated brightness increases by more than a factor 
of three as Enceladus moves from pericentre to apocentre, and most of 
this change seems to occur between orbital phases of 90° and 180°. 

Initial investigations of the longer-wavelength data obtained by 
VIMS reveal the same basic trends with orbital phase, albeit at lower 
signal-to-noise ratio. Thus far we have not detected any statistically 
significant variations in the shape of the plume’s spectrum between 
observations taken when Enceladus is at pericentre versus apocentre, 
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Figure 2 | Sample vertical profiles of the plume’s brightness. a—d, The 
plume’s equivalent width as a function of Z = [z/(rg + z)]"? for the four 
observations shown in Fig. la-d respectively. (As discussed in the text, Z can be 
regarded as a proxy for the particles’ launch velocity.) In each panel, the 
diamonds show the measurements while the line shows a simple linear fit to the 
data between altitudes z of 50 and 450 km (Z = 0.4 and 0.8, respectively). This 
range was chosen because it excludes regions that are either too close to 
Enceladus (where the moon’s limb may contaminate the signal) or too far from 
the moon (where the signal is weak). Note that in all four cases, this simple 
model provides a reasonable match to the trends in the data. 


implying that the observable particle size distribution is not a strong 
function of orbital phase. 

Similarly, the data taken between 2009 and 2012 all exhibit the same 
variations with orbital phase (see Fig. 4), indicating that the plume’s 
activity level at a given orbital phase has not changed radically in the 
past few years. However, the 2005 observations yield brightness levels 
that are roughly 50% higher than comparable later observations. This 
may represent a decrease in the plume’s average activity level between 
2005 and 2009. However, even if this turns out to be the case, the 2005 
data show the same trend of increasing brightness with increasing 
orbital phase as the later data. The variation in the plume’s activity 
on orbital timescales therefore appears to be a persistent phenomenon. 

These trends are also insensitive to altitude up to 300 km from the 
moon’s surface (where the plume is clearly detectable). However, at 
higher altitudes, these trends might reverse owing to variations in the 
plume’s scale height with orbital phase. In Fig. 2, the linear fits for the 
two observations made when Enceladus was close to its orbital apoc- 
entre intercept the x-axis when Z ~ 0.8. By contrast, the two observa- 
tions made when Enceladus was near its orbital pericentre yield trends 
that go to zero when Z is at least 0.85. This distinction appears to be 
found consistently among the other observations: the weighted aver- 
age of the x-intercepts for the profiles obtained when Enceladus was 
within +40° of pericentre occurs at Z = 0.835 + 0.013, whereas for the 
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Figure 3 | Variations in the plume’s brightness with phase angle. This plot 
shows the plume’s equivalent width at 85 km altitude (Z = 0.5) and 0.88-1.56 um 
as a function of phase angle «. Colours indicate observations made when 
Enceladus was at different orbital phases f, and symbols indicate when the 
observations were taken. The 1 s.d. statistical error bars on these data points are 
smaller than the symbol sizes (they range between 0.001 and 0.03, with most 
being between 0.005 and 0.01, see Supplementary Information). This plot shows 
that at all phase angles, the plume is consistently brighter when it is observed close 
to Enceladus’ orbital apocentre. The two lines show fiducial phase functions that 
are proportional to 0 *°, where 0 is the scattering angle. The data obtained near 
Enceladus’ apocentre follow this phase function fairly closely. For the pericentre 
data, the data do not match the predicted trend quite as well, but the above phase 
function is still an acceptable approximation to the true phase curve. 


data obtained within +20° of apocentre the average intercept occurs at 
Z = 0.793 + 0.008. This can be interpreted as a difference in the maxi- 
mum launch velocity of the observed particles, with Vax =200 + 3m 
s_' when Enceladus is near pericentre and V,,.. = 190 + 2ms_ | when 
Enceladus is near apocentre. Hence the particles visible at 0.88- 
1.56 um seem to be launched with a slightly larger maximum speed 
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Figure 4 | Variations in the plume’s corrected brightness with Enceladus’ 
orbital position. This plot shows the plume’s corrected equivalent width at 
85 km altitude and 0.88-1.56 um as a function of Enceladus’ orbital phase f- 
Colours indicate observations made at different phase angles ~, and symbols 
indicate when the observations were taken. The 1 s.d. statistical error bars on 
these data are smaller than the symbol sizes (they range between 0.001 and 0.03, 
with most being between 0.005 and 0.015, see Supplementary Information). 
Note that these data have now been corrected to remove the brightness 
variations due to varying phase angles by multiplying all the EWgs values by a 
factor of (0/20°)**. With this correction applied, the data taken at different 
phase angles follow a common trend with orbital phase. 
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when Enceladus is near its orbital pericentre. This could reflect tidally 
induced changes in the crack geometry”. However, this trend is much 
more subtle than the variation in brightness, and so additional work 
will be needed before we can securely interpret this phenomenon. 

A peak in plume activity when Enceladus is near its orbital apoc- 
entre is consistent with various geophysical calculations that suggest 
the normal stresses in Enceladus’ south polar terrain will place the 
fissures under tension when Enceladus is near apocentre, and in com- 
pression when Enceladus is near pericentre'’*’*'°*". Hence the data 
we report here provide strong evidence that tidal forces do play an 
important role in controlling Enceladus’ plume activity, perhaps by 
changing the width of the conduits between the surface and various 
underground reservoirs. 
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Squeezed light from a silicon micromechanical resonator 


Amir H. Safavi-Naeini!?*, Simon Gréblacher!*, Jeff T. Hill!?*, J asper Chan', Markus Aspelmeyer* & Oskar Painter!?4 


Monitoring a mechanical object’s motion, even with the gentle 
touch of light, fundamentally alters its dynamics. The experimental 
manifestation of this basic principle of quantum mechanics, its 
link to the quantum nature of light and the extension of quantum 
measurement to the macroscopic realm have all received extensive 
attention over the past half-century’”. The use of squeezed light, with 
quantum fluctuations below that of the vacuum field, was proposed 
nearly three decades ago’ as a means of reducing the optical read-out 
noise in precision force measurements. Conversely, it has also been 
proposed that a continuous measurement ofa mirror’s position with 
light may itself give rise to squeezed light*’. Such squeezed-light 
generation has recently been demonstrated in a system of ultracold 
gas-phase atoms* whose centre-of-mass motion is analogous to the 
motion of a mirror. Here we describe the continuous position mea- 
surement of a solid-state, optomechanical system fabricated from a 
silicon microchip and comprising a micromechanical resonator 
coupled to a nanophotonic cavity. Laser light sent into the cavity is 
used to measure the fluctuations in the position of the mechanical 
resonator at a measurement rate comparable to its resonance fre- 
quency and greater than its thermal decoherence rate. Despite the 
mechanical resonator’s highly excited thermal state (10* phonons), 
we observe, through homodyne detection, squeezing of the reflected 
light’s fluctuation spectrum at a level 4.5 + 0.2 per cent below that of 
vacuum noise over a bandwidth of a few megahertz around the 
mechanical resonance frequency of 28 megahertz. With further 
device improvements, on-chip squeezing at significant levels should 
be possible, making such integrated microscale devices well suited 
for precision metrology applications. 

The generation of states of light with fluctuations below that 
of vacuum has been of great theoretical interest since the 1970s*””. 
Early experimental work demonstrated squeezing of a few per cent 
below the vacuum noise level in a large variety of different nonlinear 
systems, such as neutral atoms in a cavity’®, optical fibres’ and crystals 
with bulk optical nonlinearities'*’’. Modern experiments demonstrate 
squeezing of almost 13 dB (ref. 14). Initial research was mainly pursued 
as a strategy to mitigate the effects of shot-noise, the manifestation 
of vacuum noise in the intensity detection of light, given the possi- 
bility of improved optical communication’ and better sensitivity in 
gravitational-wave detectors**. In recent years, in addition to being 
used in gravitational-wave detectors'*, squeezed light has enhanced 
metrology in more applied settings’®’”. 

The vacuum fluctuations arising from the quantum nature of light 
determine our ability to resolve mechanical motion optically, and set 
limits on the perturbation caused by the act of measurement'*. A 
system well suited to studying quantum measurement experimentally 
is that of cavity optomechanics, in which an optical cavity’s resonance 
frequency is designed to be sensitive to the position of a mechanical 
system. By monitoring the phase and intensity of the reflected light 
from such a cavity, a continuous measurement of mechanical displace- 
ment can be made. Systems operating on this simple principle have 
been realized in a variety of experimental settings, such as in large-scale 
laser gravitational-wave interferometers’, microwave circuits with 


20 21-23 


electromechanical elements”, solid-state mechanical elements and 
ultracold gas-phase atoms” integrated with or comprising Fabry-Perot 
cavities, and on-chip nanophotonic cavities sensitive to mechanical 
deformations”*”*. 

The canonical cavity-optomechanical system consists of an optical 
cavity resonance that is dispersively coupled to the position of a mech- 
anical resonance. The Hamiltonian describing the interaction between 
light and mechanics is Hint =hgya' ax i Xzpp, Where X= Xzp¢( bt +b) is 
the mechanical position, x,)¢ = \/h /2meff@m is the zero-point fluc- 
tuation amplitude, @,, is the mechanical resonance frequency, Meg is 
the effective motional mass of the resonator, gy is the frequency shift of 
the optical resonance for a mechanical amplitude of x,p5 / is Planck’s 
constant divided by 21, @ and a’ are respectively the annihilation and 
creation operators for optical excitations, and b and b’ are the analog- 
ous operators for mechanical excitations. The optical cavity decay rate, 
k, is the loss rate of photons from the cavity and the rate at which 
optical vacuum fluctuations are coupled into the optical resonance”’. 
Similarly, the mechanical damping rate, y;, is the rate at which thermal 
bath fluctuations couple to the mechanical system. In all experimental 
realizations of solid-state optomechanics so far, including that pre- 
sented here, the optomechanical coupling rate, g), has been much 
smaller than x. As such, without a strong coherent drive, the inter- 
action of the vacuum fluctuations with the mechanics is negligible. 

Under the effect of a coherent laser drive, the cavity is populated with 
a mean intracavity photon number (n,), and we consider the optical fluc- 
tuations about the classical steady state, a— \/(n-) +4. This modifies 
the optomechanical interaction, resulting in a linear coupling between 
the fluctuations of the intracavity optical field, X =a+a', and the 
position fluctuations of the mechanical system, x: Hint = hGXox i Xapf- 
The parametric linear coupling occurs at an effective interaction rate of 
G= \/ (n-)go, and the mechanical motion is coupled to the intracavity 
optical field at a rate of Ineas = 4G"/kk. Through this interaction, the 
intensity fluctuations of the vacuum field, ee g(t), entering the cavity 
impart a force on the mechanical system: 

Faa() = Ym x4 (1) 
Xopf 
This radiation-pressure shot-noise (RPSN) force has previously been 
measured in an ultracold atomic gas** and, more recently, on a mac- 
roscopic silicon nitride nanomembrane”. The mechanical motion is in 
turn recorded in the phase of the light leaving the cavity: 
VF meas X(t) sin (0) (2) 


x ( t) 
Xopf 


x (t)—2 


Here x — ae? + a ein and Gigut are respectively the operators of 
the input and reflected optical fields from the cavity, and 0 is the quad- 
rature angle with 0 = 0 and @ = 1/2 corresponding respectively to the 
intensity and phase quadratures. In such a measurement, the optical 
cavity has the role of the position detector, measuring the observable x 
at a rate neas, and the RPSN imposes a measurement back-action force 
on the mechanical system”. In addition to this back-action noise, thermal 
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fluctuations from the bath also drive the mechanical motion, with 
their magnitudes becoming comparable as I,.,, approaches the ther- 
malization rate, I thermal(@) =y;n(@), where 7(@) is the thermal bath 
occupancy. 

Formally, the output noise power spectral density (PSD) of the 
homodyne detector photocurrent, I, normalized to the shot-noise level 
is found by taking the Fourier transform of the autocorrelation of 
equation (2): 


- AD ine: 
Sta) =] = 
Xrpt 


S.csin? (+ FRefz}sin(20)] (3) 


where Sxx(@) is the noise PSD of the mechanical position fluctuations 
of the resonator and 7,,(@) = (Merp(@2, — @? — i,m) ' is the mech- 
anical susceptibility characterizing the response of the mechanical 
system to an applied force. The PSD S,(@) contains noise stemming 
from coupling to the thermal bath, quantum back-action noise from 
the light field and any other technical laser noise driving the mechanics 
(Supplementary Information). The three terms in S¢¥'(w) in equa- 
tion (3), from left to right, are due to shot-noise, mechanical position 
fluctuations and the cross-correlation between the back-action noise 
force and mechanical position fluctuations. Only the third term can 
have a negative noise PSD and give rise to squeezing. 

The primary hurdle to observing such squeezing, as in many quantum 
measurements, is strongly coupling to a preferred detection channel 
while simultaneously minimizing unwanted environmental perturba- 
tions. Most relevant to the work presented here are frequencies detuned 
from resonance, > |6 = @» — ©| >> 7» for which the approximate 
output noise PSD is 


20 meas Om n(@) 
6 6 Qn 


where we have assumed that the mechanical position fluctuations are 
predominantly due to thermal bath coupling (at high optical power, 
back-action heating and laser noise also contribute to the mode occu- 
pancy). In this case, fluctuations from the thermal bath limit appreciable 
squeezing of the optical probe field to a regime in which 7(@)/Qm < 1. 
This requirement is equivalent to having Qn@m 2 kg Tp/h, where kg is 
Boltzmann’s constant and Ty, is the bath temperature. Appreciable 
squeezing also requires a detuning |5| 2 (7/Qm)@m and a correspond- 
ing measurement rate larger than this detuning. Therefore, in the pres- 
ence of thermal bath noise, squeezing over a significant spectral 
bandwidth requires not only a large cooperativity (C= Ines!) 
between the optical and mechanical components, as realized in recent 
cooling experiments”*”, but the more stringent requirement that the 
effective measurement back-action force be comparable to all forces 
acting on the mechanical resonator, including the elastic restoring force 
of the mechanical structure. 

To meet the requirements of strong measurement and efficient 
detection, we designed a ‘zipper-style’ optomechanical cavity” with a 
novel integrated waveguide coupler fabricated from the 220-nm-thick 
silicon device layer of a silicon-on-insulator microchip (Fig. 1). The 
in-plane differential motion of the two beams at a fundamental fre- 
quency of «,,/2m = 28 MHz strongly modulates the co-localized fun- 
damental optical resonance of the cavity with a theoretical vacuum 
coupling rate of go/2m = 1 MHz. As shown in Fig. 1b, we use a silicon 
waveguide with a high-reflectivity photonic crystal end-mirror to 
excite and collect light efficiently from the optical cavity. Light from 
the silicon waveguide is coupled to a single-mode optical fibre using an 
optical fibre taper and a combination of adiabatic mode coupling and 
transformation. 

The experimental set-up used to characterize the zipper cavity sys- 
tem and measure the optomechanical squeezing of light is shown in 
Fig, 2a. The silicon sample is placed in a continuous-flow “He cryostat 
with a cold-finger temperature of 10 K. A narrowband laser beam is 
used to probe the optomechanical system and measure the mechanical 
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motion of the zipper cavity. A wavelength scan of the reflected signal 
from the cavity is plotted in Fig. 2b, showing an optical resonance 
with a linewidth «/2m = 3.42 GHz at a wavelength of 2. = 1,540 nm. 
Inefficiencies in the collection and detection of light result in addi- 
tional uncorrelated shot-noise in the measured signal and can reduce 
the squeezing to undetectable levels. For the device studied here, the 
cavity coupling efficiency, corresponding to the percentage of photons 
sent into the cavity which are reflected, is determined to be 7. = 0.54. The 
fibre-to-chip coupling efficiency is measured at ycp = 0.90. A homodyne 
detection scheme allows for high-efficiency detection of arbitrary quad- 
ratures of the optical signal field. Characterization and optimization of 
the efficiency of the entire optical signal path and homodyne detection 
system yielded an overall set-up efficiency of set-up = 0.48, correspond- 
ing to a total signal detection efficiency of Not = Nset-up!c = 0.26. 
Figure 2c shows the noise spectrum of the thermal motion of the 
mechanical resonator obtained by setting the laser frequency near 
the cavity resonance and tuning the relative local-oscillator phase 
of the homodyne detector, 6,.,, to measure the quadrature of the 
reflected signal in which mechanical motion is imprinted. The mech- 
anical spectrum shows the in-plane differential mode of interest at @,,/ 
2m = 28 MHz, as well as several other more weakly coupled mech- 
anical resonances of the nanobeams and coupling waveguide. A high- 
resolution, narrowband spectrum of the in-plane differential mode is 
inset in Fig. 2c, and shows a linewidth of y;/2m = 172 Hz, correspond- 
ing to a mechanical Q-factor of Q,, = 1.66 X 10°. The vacuum coup- 
ling rate of the in-plane differential mode, measured from the detuning 
dependence of the optical spring shift and damping, is determined to 
be go/2m = 750 kHz, in good agreement with theory. From the cal- 
ibration of the noise power under the Lorentzian distribution in 
Fig. 2c, the in-plane differential mode is found to thermalize at low 
optical probe power to a temperature of T,, ~ 16 K, corresponding toa 
thermal phonon occupancy of 7i(@m)*1.2 x 10*. This yields a ratio 
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Figure 1 | Optomechanical device. a, Scanning electron microscope image of 
a waveguide-coupled zipper optomechanical cavity. The waveguide width is 
adiabatically tapered along its length and terminates with a photonic crystal 
mirror next to the cavity. The tapering of the waveguide allows for efficient 
input-output coupling and the photonic crystal termination makes the 
coupling to the cavity single sided. Two zipper cavities are coupled above and 
below the waveguide, each with a slightly different optical resonance frequency, 
allowing them to be separately addressed. b, Left: close-up of the coupling 
region between one of the cavities and the waveguide. Right: finite-element 
method (FEM) simulation of the cavity field leaking into the waveguide (log 
scale). Note that the field does not leak into the mirror region of the waveguide. 
c, Top: FEM simulation showing the in-plane electrical field of the fundamental 
optical cavity mode. Bottom: FEM simulation of the displacement of the 
fundamental in-plane differential mode of the structure with frequency @,)/ 
2n = 28 MHz. The mechanical motion, modifying the gap between the beams, 
shifts the optical cavity frequency, leading to optomechanical coupling. 
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Figure 2 | Experimental set-up and device characterization. a, The optical 
signal is derived from an external-cavity diode laser and is sent into a tapered 
optical fibre inside a liquid-helium cryostat where the silicon sample is cooled 
to T~ 16K. The fibre taper is used to couple light evanescently into the silicon 
optomechanical device. The optical reflection from the device is collected by the 
same fibre taper and sent to a balanced homodyne detector (BHD) for 
characterization. For further details of the experimental set-up, see Methods 
Summary. The efficiencies of the circulator, switch and BHD are respectively 
denoted 1723, 734 and 1747p. AOM, acousto-optic modulator; EDFA, erbium- 
doped fibre amplifier; ENA, electronic network analyser; FPC, fibre 
polarization controller; FS, fibre stretcher; IM, intensity modulator; A-meter, 
wavemeter; LPF, low-pass filter; PD, photodetector; PM, power meter; RSA, 
real-time spectrum analyser; VC, variable coupler; VOA, variable optical 
attenuator. b, Top: reflected signal from the optical cavity at low optical power 
((n,) ~ 10; linewidth, «/2 = 3.42 GHz). Bottom: high-power ((n,) = 790) 


(Om) / Qn~0.072, which is well within the regime where squeezing is 
possible. 

To study accurately the noise properties of the reflected optical 
signal from the cavity, we make a series of measurements to character- 
ize our laser and detection set-up. Figure 2d shows the measured noise 
PSD of the balanced homodyne detector for w ~ @,, as a function of 
local-oscillator power, indicating a linear dependence on power and 
negligible (0.1%) added noise above shot-noise. In the measured 
squeezing data discussed below, a local-oscillator power of 3 mW is 
used. Calibration of the laser intensity and frequency noise over the 
frequency range of interest («/2m = 1-40 MHz) is described in Sup- 
plementary Information. The laser intensity noise is measured to be 
dominated by shot-noise over this frequency range, and the laser fre- 
quency noise is measured to be roughly flat in frequency at a level of 
Swen 5 X 10° rad? Hz, which contributes insignificantly to the detected 
noise floor. Figure 2e shows the measured homodyne detector noise level 
normalized to vacuum noise for reflected laser light far-detuned from 
the optical cavity resonance. The small (~ £0.15%) deviation in the 
measured noise level bounds the systematic uncertainty in the detector 
gain versus quadrature bias point as well as the contribution to the 
measured noise from optical elements other than the silicon cavity. 

Measurements of the noise in the reflected optical signal from the 
cavity as a function of quadrature angle, frequency and signal power 
are shown in Figs 3 and 4. These measurements are performed for laser 
light on resonance with the optical cavity and for input signal powers 


reflected signal, showing the cavity-laser detuning (dashed line) locked to 
during squeezing measurements. c, Homodyne noise PSD of the reflected 
signal showing the transduced thermal Brownian motion of the zipper cavity at 
Tp = 16 K (green curve; (n,) = 80). The red curve is the shot-noise level and the 
black curve is the detector’s dark noise (in the absence of light input). Inset, 
close-up of the fundamental in-plane differential mechanical mode of the 
zipper cavity (fitted by blue curve; linewidth, y;/2m = 172 Hz). d, Mean value of 
the PSD of the BHD as a function of the local-oscillator (LO) power (signal 
blocked). The filled data point indicates the local-oscillator power used in the 
squeezing measurements. The red and dashed black curves correspond to a 
linear fit to the data and the level of the detector dark current, respectively. 

e, Noise PSD as a function of 6),-k (ranging from 0 (green) to 7 (red)) with the 
signal detuned far off resonance at A/K ~ 30, referenced to the noise level with 
the signal blocked (blue). 


varying from 252nW to 3.99 LW, with the maximum signal power 
corresponding to an average intracavity photon number of (n,) = 
3,153. The laser is set at the appropriate cavity detuning for each signal 
power by scanning the wavelength across the cavity resonance while 
monitoring the reflection, and then stepping the laser frequency 
towards the cavity from the long-wavelength side until the reflection 
matches the level that corresponds to a detuning of A) 4/K = 
0.044 + 0.006. This produces a shift of # = (0.73 + 0.03)z in the mea- 
sured quadrature angle from the on-resonance condition of equa- 
tions (1)-(3) (Methods Summary). 

In Fig. 3, we plot the theoretically predicted and measured noise 
PSDs versus quadrature angle for a signal power corresponding to 
(n.) = 790 photons. Each quadrature spectrum is the average of 150 
traces, and after every other spectrum the signal arm is blocked and the 
shot-noise PSD is measured. The shot-noise level, corresponding to 
optical vacuum on the signal arm, is used to normalize the spectra. At 
certain quadrature angles, and for frequencies a few megahertz around 
the mechanical resonance frequency, we find that the light reflected 
from the zipper cavity shows a noise PSD below that of vacuum. The 
density plot of the theoretically predicted noise PSD (Fig. 3a) shows the 
expected wideband squeezing due to the strong optomechanical coup- 
ling in these devices, as well as a change in the phase angle where 
squeezing is observed below and above the mechanical frequency. 
This change is due to the change in sign of the mechanical susceptibility 
and the corresponding change in phase of the mechanical response to 


8 AUGUST 2013 | VOL 500 | NATURE | 187 


©2013 Macmillan Publishers Limited. All rights reserved 


LETTER 


Frequency (MHz) 


0.4 
rock!’ n 


Figure 3 | Optomechanical squeezing of light. a, Theoretical model. Density 
plot of the predicted reflected-signal noise PSD, as measured on a balanced 
homodyne detector and normalized to shot-noise, for a simplified model of the 
optomechanical system (Supplementary Information). Areas with noise below 
shot-noise are shown in blue shades on a linear scale. Areas with noise above 
shot-noise are shown in orange shades on a log scale. The solid white line is a 
contour delineating noise above and below shot-noise. b, Experimental data. 
Density plot of the measured reflected-signal noise PSD for (n,) = 790, 
normalized to the measured shot-noise level. c, Slice of the measured density 


RPSN. The measured noise PSD density plot (Fig. 3b) shows the pres- 
ence of several other mechanical noise peaks and a reduced squeezing 
bandwidth, yet the overall phase- and frequency-dependent character- 
istics of the squeezing around the strongly coupled in-plane mech- 
anical mode are clearly present. In particular, Fig. 3c and Fig. 3d 
show two slices of the noise PSD density plot in which the region of 
squeezing changes from being below the mechanical resonance fre- 
quency to being above it. 

In Fig. 4a, we show the measured noise PSD as a function of quad- 
rature angle for a frequency slice at w/2m = 27.9 MHz of the data 
shown in Fig. 3b. The measured squeezing and anti-squeezing are seen 
to be smaller and, respectively, larger than expected from a model of 
the optomechanical cavity without thermal noise. We also plot, in 
Fig. 4b, the maximum measured and modelled squeezing as functions 
of signal power. The simple theory predicts a squeezing level that 
monotonically increases with signal power, whereas the measured 
maximum squeezing saturates at a level of 4.5+0.2% below the 
shot-noise at an intracavity power corresponding to (n,) = 1,984 
photons. The error in the squeezing is dominated by the uncertainty 
in the linearity of the detector gain (+0.15%) and the variance of the 
measured shot-noise level (+0.1%). 

To understand the processes that limit the bandwidth and mag- 
nitude of the measured squeezing, we plot in Fig. 4c the noise PSD 
for phase quadratures that maximize (left plot) and minimize (right 
plot) the transduction of the mechanical mode peak. Along with the 
measured data, we also plot the estimated noise due to phase noise of 
the signal laser, and that for a model of a single mechanical mode 
coupled to a thermal bath at T, = 16 K. Low-frequency noise in the 
motion quadrature shows an «w | frequency dependence consistent 
with structural damping effects*’, but is much larger than that of the 
single-mode noise model. Noise in the quadrature that minimizes 
transduced motion is orders of magnitude larger than the noise pre- 
dicted by the single-mode model and laser phase noise, and shows an 
w \ frequency dependence. The optical power dependence of the 
low-frequency noise in the motion quadrature indicates that optical 
absorption heats the structure to Ty, ~ 30K at the highest measured 
powers. As detailed in Supplementary Information, the red curve in 
each of the plots in Fig. 4 shows a full noise model incorporating 
structural damping noise from higher-frequency mechanical modes, 
optical absorption heating and a phenomenological « ”” noise term. 
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plot in b taken at 6),.4,/7 = 0.23. d, Slice of the measured density plot in b taken 
at Oiock/T = 0.16. In c and d, the black curve corresponds to the measured data 
slice extracted from b. The dark blue traces are several measurements of the 

shot-noise level (average shown in light blue). Also shown is a model of the 

squeezing in the absence of thermal noise (orange curve), the same model with 
ohmic thermal noise of the mechanical mode included (green) and a full noise 
model including additional phenomenological noise sources (red curve). The 
vertical white dashed lines in a and b indicate the data slices shown in c and d. 
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Figure 4 | Spectral and power dependence of noise. a, Measured balanced 
homodyne noise power of the reflected signal at «/2m = 27.9 MHz (filled circles) 
versus quadrature angle (Ajoc/# = 0.044 and (n,) = 790). The green and red curves 
correspond to the single-mode and full noise models, respectively. The orange curve 
represents a model including the response of the mechanical mode in the absence of 
thermal noise, that is, when driven by RPSN only, and the dashed blue curve shows 
the thermal noise component. Inset, close-up of boxed region. b, Measured 
minimum noise PSD normalized to shot-noise (filled circles) versus (n.). Left: 
maximum squeezing for @ < ,,; right: maximum squeezing for © > Mp. Also 
shown are the single-mode noise model (green curve) and the full noise model (red 
curve). ¢, Balanced homodyne noise PSD of the reflected cavity signal for Ajoc/ 
Kk = 0.052 and (n,) = 3,153. Left: phase quadrature corresponding to maximum 
transduction of mechanical motion; right: phase quadrature corresponding to 
minimum transduction of mechanical motion. In each plot, the black curve is the 
measured data with the shot-noise level subtracted. Also shown are the modelled 
laser phase noise (dashed brown curve), the noise contribution from a single 
mechanical mode (dashed blue curve) and the full noise model (red curve). 
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These models indicate that the currently obtainable levels of squeezing 
are limited by the thermal noise of higher-order mechanical modes. 

These measurements show that by reflecting light off a thin-film 
mechanical resonator undergoing large-amplitude thermal motion, 
light that is in certain respects ‘quieter’ than vacuum can be obtained. 
In contrast to previous work with ultracold gas-phase atoms®, which 
used a narrowband atomic resonance and operated in short bursts 
owing to the atomic trap lifetime, the solid-state devices in this work 
allow for steady-state squeezing over almost 10 MHz of bandwidth and 
at optical frequencies completely tailorable through geometric design. 
The modest level of squeezing, limited by thermal noise and structural 
damping effects in the current devices, may also be substantially 
improved by increasing the mechanical Q-factor. Measurements of 
similar silicon devices with different surface treatments have yielded 
mechanical Q-factors as high as 7 X 10° (Supplementary Informa- 
tion), which, for the optical power levels used here, should enable 
on-chip squeezing 6 dB below shot-noise. Given the integrability of 
these microchip devices, optical extraction inefficiencies can be 
avoided by sending the squeezed light generated by one device directly 
into a second probe device. For example, such an on-chip squeezer and 
detector could form the basis of a quantum-enhanced micromechani- 
cal displacement and force sensor'’. More generally, we expect these 
sorts of device to enable future experiments involving feedback and 
strong measurement of the dynamics of a mechanical system. 


METHODS SUMMARY 


Experimental set-up. A tunable external-cavity diode laser, actively locked to a 
wavemeter, is used to generate a strong local oscillator and the measurement 
signal. Fractions of the local oscillator and input signals are split off and detected, 
using intensity modulators to stabilize their power levels. Fibre polarization con- 
trollers adjust the polarization of the local oscillator and signal. A variable optical 
attenuator is used to set the signal power and an acousto-optic modulator is used to 
generate a tone for calibration (Supplementary Information). The reflected signal 
from the cavity is separated using a circulator, and is switched between one of three 
detection paths: one contains a power meter for power calibration, one contains a 
photodetector (PD1) for spectroscopy of the cavity and one contains an erbium- 
doped fibre amplifier for measurement of the mechanical spectrum on a real-time 
spectrum analyser or a network analyser. Squeezing of the reflected cavity signal is 
measured on a fourth path containing a variable coupler, where the signal is 
recombined with the local oscillator and detected on a balanced homodyne 
detector. 

Homodyne phase angle. The relative phase between the local oscillator and the 
signal is determined from the low-pass-filtered component of the signal detected 
using a balanced homodyne detector, and set using a fibre stretcher. Locking the 
level of this signal determines the phase angle between the local oscillator and the 
reflected signal, which we call 0)... This angle differs from the phase 0 between the 
light input to the cavity and the local oscillator, but is related to it through the 
phase response of the cavity: 0,,.4. = 0 — ¢, where $(4) = Arg[1—K,/(i4 + «/2)| 
and x, is the extrinsic cavity coupling rate. 

Cavity lock. A lock point slightly red detuned from resonance is chosen to avoid 
instabilities of the system resulting from thermo-optical effects. The laser is locked 
to this frequency using a wavemeter with a frequency resolution of +0.0015K. 
Drift of the optical cavity resonance over a single noise spectrum measurement 
(order of minutes) is found to be negligible. An estimate of the variance of Ajo. is 
determined from the dependence of the transduction of the mechanical motion on 
the quadrature phase, which indicates that from one lock to another Ajoa/i 
= 0.044 + 0.006. 
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The growth and reduction of Northern Hemisphere ice sheets over 
the past million years is dominated by an approximately 100,000-year 
periodicity and a sawtooth pattern’” (gradual growth and fast termi- 
nation). Milankovitch theory proposes that summer insolation at 
high northern latitudes drives the glacial cycles’, and statistical tests 
have demonstrated that the glacial cycles are indeed linked to eccent- 
ricity, obliquity and precession cycles*”. Yet insolation alone cannot 
explain the strong 100,000-year cycle, suggesting that internal cli- 
matic feedbacks may also be at work*”’. Earlier conceptual models, 
for example, showed that glacial terminations are associated with the 
build-up of Northern Hemisphere ‘excess ice**”°, but the physical 
mechanisms underpinning the 100,000-year cycle remain unclear. 
Here we show, using comprehensive climate and ice-sheet models, 
that insolation and internal feedbacks between the climate, the 
ice sheets and the lithosphere-asthenosphere system explain the 
100,000-year periodicity. The responses of equilibrium states of 
ice sheets to summer insolation show hysteresis'’’, with the shape 
and position of the hysteresis loop playing a key part in determining 
the periodicities of glacial cycles. The hysteresis loop of the North 
American ice sheet is such that after inception of the ice sheet, its 
mass balance remains mostly positive through several precession 
cycles, whose amplitudes decrease towards an eccentricity mini- 
mum. The larger the ice sheet grows and extends towards lower 
latitudes, the smaller is the insolation required to make the mass 
balance negative. Therefore, once a large ice sheet is established, a 
moderate increase in insolation is sufficient to trigger a negative 
mass balance, leading to an almost complete retreat of the ice sheet 
within several thousand years. This fast retreat is governed mainly 
by rapid ablation due to the lowered surface elevation resulting from 
delayed isostatic rebound'*"’, which is the lithosphere-asthenosphere 
response. Carbon dioxide is involved, but is not determinative, in 
the evolution of the 100,000-year glacial cycles. 

Several internal feedback mechanisms have been suggested as cru- 
cial in 100-kyr glacial cycles, such as delayed bedrock rebound'*""’, 
the calving of ice-sheet margins’*, CO, variations’”"*, ocean feedback"® 
and dust feedback’’”®. The importance of these mechanisms needs to 
be investigated with physical models. Here we report numerical experi- 
ments with an ice-sheet model for the Northern Hemisphere, IcIES, in 
combination with the general circulation model (GCM) MIROC 
(Methods and Supplementary Fig. 1). Although it is not practical to 
run GCMs with fully coupled ice-sheet models on glacial—interglacial 
timescales”’, it is necessary to take into account the feedback from ice 
sheets on climate. In this study, a climate parameterization for the ice- 
sheet model is developed and calibrated using a suite of multi-snapshot 
atmospheric GCM experiments forced with different insolation values 
(for different eccentricities, obliquities and precessions), CO concen- 
trations and ice-sheet sizes, calculated in advance”. The ice-sheet 
model with the climate parameterization (IcCIES-MIROC) can repres- 
ent fast feedbacks, such as water vapour, cloud and sea-ice feedbacks, 


and slow feedbacks, such as albedo/temperature/ice-sheet and lapse- 
rate/temperature/ice-sheet feedbacks”. We calculate the ice-sheet vari- 
ation for the past 400 kyr forced by the insolation and atmospheric 
CO, content with improved dating” after running the simulation long 
enough to remove the dependence on the initial conditions (Figs 1a, b; 
Methods). After validating these results using palaeoclimate proxy 
data, we conducted sensitivity experiments to investigate the mech- 
anism of ~100-kyr glacial cycles. 

Our model realistically simulates the sawtooth characteristic of gla- 
cial cycles, the timing of the terminations and the amplitude of the 
Northern Hemisphere ice-volume variations (Fig. 1d) as well as their 
geographical patterns at the Last Glacial Maximum and the sub- 
sequent deglaciation (Supplementary Figs 2 and 3 and Supplemen- 
tary Video 1). In the frequency domain, our model produces the largest 
spectral peak at a periodicity of ~100kyr, as observed in the data 
(Fig. 1), even without the ocean feedback’® or dust feedback’’. In a 
series of model experiments, we investigated the roles of CO (which 
also varies with a 100-kyr periodicity; Fig. 1b), various model para- 
meters such as the time constant and the effective mantle density for 
isostatic rebound, and mass loss due to calving into proglacial lakes. 
The ~100-kyr periodicity, the sawtooth pattern and the timing of the 
terminations are reproduced with constant CO, levels*’** (for example 
220 p.p.m.; Fig. le), and are robust for a range of model parameters 
(Supplementary Fig. 4). 

By contrast, the spectral peak of ~100-kyr cycles is greatly reduced, 
and permanent large ice sheets remain, with the imposition of instant- 
aneous isostatic rebound (Fig. 1f). This result supports the idea that the 
crucial mechanism for the ~100-kyr cycles is the delayed glacial iso- 
static rebound’*"*, which keeps the ice elevation low, and, therefore, 
the ice ablation high, while the ice sheet retreats. We note, however, 
that CO, variations can result in amplification of the full magnitude of 
ice-volume changes during the ~100-kyr cycles, but do not drive the 
cycles. Ice-sheet changes may induce variations in CO, through chan- 
ging sea surface temperature, affecting the solubility of CO, (ref. 25), 
and through changing sea level, affecting the stratification of and CO, 
storage in the Southern Ocean"*. During deglaciation, the melt water 
may affect ocean circulation, leading to an increase in atmospheric 
CO, (refs 23, 26, 27). 

A striking feature of our results is that, in the experiments with 
constant CO, levels, the strong ~ 100-kyr cycle with a large amplitude 
appears only for the North American ice sheet within a particular 
range of CO, levels; the spectral peak of ~100-kyr cycle becomes small 
compared with those of ~41 and ~23-kyr cycles for CO, levels above 
230p.p.m. or below 190p.p.m. (Fig. 1g). The Eurasian ice sheet 
responds only to insolation forcings at ~41-kyr and ~23-kyr peri- 
odicities, with small amplitudes in all cases (Fig. 1h). To investigate the 
mechanisms behind these observations, we conducted 200-kyr model 
experiments to obtain stable equilibria of both ice sheets for a range of 
prescribed climatic forcings, starting from either no ice or from large 
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ice sheets; we use summer temperature anomalies ranging from —5 to 
+3 K relative to the present day. 

Because of strong albedo and topographic feedbacks, ice sheets are 
expected to have multiple stable equilibria'*. We indeed observe two 
different equilibrium states for a range of climatic forcings, depending 
on the initial size of the ice sheets. Figure 2a shows maps of the equi- 
librium ice sheets and their corresponding surface mass balances, for 
various climate forcings, computed with large initial ice sheets. We also 
show the equilibrium volumes of the North American and Eurasian ice 
sheets versus the climate forcing (Fig. 2b), which both have hysteresis 
loops but with different shapes. For each ice sheet, the lower and upper 
branches in the ice-volume hysteresis loop (Fig. 2b, blue and red lines) 
correspond to equilibrium states resulting from small and large initial 
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states, respectively. The hysteresis branches define the ice-sheet states 
with neutral (equilibrium) mass balance for a given climatic forcing; 
the ice-sheet gains (or loses) mass if the climatic forcing falls below the 
lower branch (or rises above the upper branch). Crucially, the larger 
the ice sheet becomes, the smaller the forcing required for negative 
mass balance, as is reflected in the inclination of the upper branch. The 
positions and shapes of the hysteresis loops, and especially the inclina- 
tions of the upper branches, are quite different for the two ice sheets. 
The equilibrium states on the upper hysteresis branch of the North 
American ice sheet vary gradually over a wider range of forcings, from 
—2 to +2K, relative to those of the Eurasian ice sheet, which range 
from —2 to —1K (Fig. 2a, b). 

To identify the physical mechanisms causing ~100-kyr cycles, we 
compare the results of equilibrium states with the simulated transient 
ice volume of the standard case with varying insolation and CO) for- 
cings for the most recent glacial cycle (Fig. 1d); these data are plotted 
together in Fig. 2b. To enable the comparison, we converted insolation 
and CO), forcings to the summer temperature anomaly” (Methods). 
For the North American ice sheet, starting from the last interglacial 
forcing, 122 kyr before present (BP), with no ice, a rapid decrease in 
insolation well below the lower branch forces the mass balance to 
become positive and large, triggering the inception and growth of 
the ice sheet from the Canadian high Arctic, around latitude 70° N, 
to Labrador. Although the summer insolation maxima are large for the 
first two precessional cycles because of large eccentricity (104 and 
84 kyr Bp; Fig. 2b), the mass balance becomes negative only for a few 
thousand years because the upper hysteresis branch extends to high 
forcing values for small volumes. 

As the ice sheet grows, the insolation forcing required for negative 
mass balance gradually becomes smaller. However, the reduction in 
eccentricity also makes the subsequent insolation maxima smaller, so 
the ice sheet continues to experience mostly a positive or near-neutral 
mass balance. By the fifth precession minimum (24 kyr Bp) since the 
most recent interglacial period, near the eccentricity minimum, the 
volume of the North American ice sheet reaches nearly 90 m sea-level 
equivalent (that is, a volume equivalent to a change of 90 m in global 
sea level). At this stage, the southern margin of the large ice sheet is 
warm enough that a moderate climatic forcing can cause the ice sheet 
to retreat. With the subsequent increase in eccentricity, the summer 
insolation forcing in the next precessional cycle provides enough time 
and intensity for a rapid disintegration of the ice sheet (note the large 
excursion of insolation forcing above the upper hysteresis branch; 
Fig. 2b), which is why a large ice volume, called “excess 100-kyr ice”, 
is observed before each glacial termination. 

The upper branch in our hysteresis loop defines the threshold in ice 
volume and insolation at which the switch from a glacial state to a 
deglacial state occurs. Whereas the growth rate is governed by the 
gradual accumulation of snow, the retreat rate is governed by highly 
nonlinear processes such as the large ablation of ice that results from 


Figure 1 | Time series of forcing and responses of Northern Hemisphere ice 
sheets. Left, time series of the past 400 kyr; right, corresponding spectra. 

a, Mean extra-atmospheric insolation at latitude 65° N on 21 June of each year, 
which closely corresponds to the summer solstice. b, Atmospheric CO, from 
Vostok ice core on a revised timescale (ref. 23 and references therein). 

c, 5'°O from benthic foraminifera as a proxy for sea level and deep ocean 
temperature”. d, Modelled sea-level equivalent (SLE) of ice-volume changes 
relative to present with variations in atmospheric CO; content and insolation 
(standard case). e, Same as d but with a constant CO, concentration of 

220 p.p.m. f, Same as e but with instant isostatic rebound. g, Same as d but with 
different constant CO, concentrations (blue, 160 p.p.m.; black, 220 p.p.m.,; red, 
260 p.p.m.) for the North American ice sheet. h, Same as g but for the 
Eurasian ice sheet. The spectra (right) show the amplitudes (calculated by the 
Multi-Taper Spectral Analysis Methods (MTM); using Analy Series; 
http://www.Isce.ipsl.fr/logiciels/index.php) in the corresponding frequencies of 
the time series (left). The coloured dots indicate peaks with more than 95% 
significance for the corresponding coloured curves. 
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Figure 2 | Hysteresis of equilibrium states and 
transient evolution of the Northern Hemisphere 
ice sheets. a, Maps showing the equilibrium shapes 
and surface mass balances of ice sheets when the 
climatic anomalies relative to present conditions 
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the low surface elevation due to the delayed isostatic response. Other 
processes may enhance the fast retreat, such as calving into proglacial 
lakes (Supplementary Fig. 2), increasing CO, concentrations, dust 
feedback”®, vegetation feedback’* and basal sliding”. 

In contrast to the North American ice sheet, the dominant cycle of 
the volume of the Eurasian ice sheet has a period of ~40 kyr, and the 
volume never grows beyond 40m sea-level equivalent. This pattern 
occurs for two reasons. First, the hysteresis branches of the ice volume 
are located within the lower half of the range of possible forcing varia- 
tions (Fig. 2b). Thus, the ice sheet loses mass for a long time during 
insolation cycles. The difference in the positions of the hysteresis branches 
stems from the summers being generally warmer over Eurasia than over 
North America at high latitudes. Second, and more fundamentally, the 
upper hysteresis branch shows a step change, similar to that which 
occurs over Antarctica’’, whereby the volume decreases by 120 m sea- 
level equivalent for an increase in climatic forcing of only 1 K. Thus, 
regardless of the mean climatic state, the Eurasian ice sheet would not 
show ~100-kyr cycles because it cannot sustain intermediate ice 
volumes under the widely varying summer insolation forcing (equi- 
valent to 6 K ina precessional cycle); the ice sheet can be only very large 
or very small (not shown). In summary, the shape and position of the 
hysteresis curve, different for each continent and for different constant 
CO, levels, are important in determining whether the dominant cli- 
matic cycle is ~100 kyr or ~40 kyr in period. 

The ice sheets behave as a dynamical system: an ice sheet tends to 
approach a stable equilibrium state that also changes with time as the 
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are respectively (left to right) —2, -1,0 and 1K 
(summer temperature) and when the model runs 
start from large initial ice sheets. Colours indicate 
the surface mass balance in metres per year. Note 
the large ablation areas and ablation rates (negative 
mass balance) that appear in the warm low 
latitudes. b, Modelled equilibrium and transient ice 
volumes as functions of the summer temperature 
anomaly for the North American (left) and 
Eurasian (right) ice sheets: red dots denote the 
large-volume equilibrium states if the model runs 
start from large initial ice sheets; blue dots show the 
small-volume equilibrium states for small initial ice 
sheets. The blue areas indicate a positive total mass 
balance of the ice sheet; red areas indicate a 
negative total mass balance. The black dots mark 
the evolution of the transient ice volume every 2 kyr 
for the last glacial cycle starting 122 kyr Bp. The 
small numbers on the black trajectories show the 
corresponding time in kiloyears. The horizontal 
scales below the figures show the relation between 
the temperature anomaly (Methods) and the 
corresponding insolation at latitude 65° N on 21 
June for two given constant atmospheric CO, 
concentrations (220 p.p.m. and 280 p.p.m.). 

c, Same as b but data shown as time series for the 
past two glacial cycles. 
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climatic forcing changes. This behaviour is illustrated in Fig. 2c, which 
shows the time series of the volume evolution (black lines) and the 
attracting steady states (blue and red lines) corresponding to the hys- 
teresis branches in Fig. 2b. Points where the stable equilibrium state 
lines cross correspond to changes in the sign of the mass balance and, 
thus, to changes between growing and shrinking ice sheets. The dif- 
ferent timescales for growth (~10* yr) and decay (~10° yr) result in 
the decreases in volume evolution (Fig. 2c, black curves) to be much 
steeper than the increases. This asymmetry ultimately explains the 
characteristic sawtooth shape of the glacial cycles. 

To understand the relative importance of the three astronomical 
parameters in generating the ~100-kyr cycles of the North American 
ice sheet, we conducted model experiments in which we kept fixed the 
eccentricity, obliquity or precession in turn, under a constant CO, 
concentration of 220 p.p.m. Results show that the ~100-kyr cycles 
persist for fixed obliquity, but not for fixed eccentricity or for fixed 
precession (Fig. 3 and Supplementary Fig. 6). These results demon- 
strate the essential role of precession and the eccentricity variation for 
the ~100-kyr cycle. Obliquity is not the driver of the ~100-kyr cycle, 
although it helps to amplify the ice-volume changes from glacial states 
to interglacial states. In summary, our model results suggest that the 
~100-kyr cycle is essentially produced by the eccentricity modulation 
of precession amplitude through the changes in summer insolation®, 
with the support of obliquity for glacial terminations, especially when 
eccentricity remains small after its minimum (for example at termi- 
nation I 20-10 kyr Bp and at termination IV 340-330 kyr bp). 
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Figure 3 | Role of eccentricity, obliquity and precession in the 100-kyr cycle. 
Time series of the model experiments with one of eccentricity, obliquity or 
precession fixed for a constant atmospheric CO, concentration of 220 p.p.m. 
a, Insolation forcing (insolation at latitude 65° N on 21 June) with variations in 
eccentricity, obliquity and precession (black lines); with obliquity fixed at 23.5° 
(red lines); with eccentricity fixed at 0.02 (blue lines); and with perihelion 
passage fixed at the spring equinox and no precession (green lines). 

b, Corresponding spectra of insolation change in a (as in Fig. 1a). c, Calculated 
ice-volume change, expressed as sea-level equivalent (colours same as in a). 
d, Corresponding spectra of calculated ice-volume change in c (as in Fig. 1d). 


A remarkable conclusion from our model results is therefore that 
the 100-kyr glacial cycle exists only because of the unique geographic 
and climatological setting of the North American ice sheet with respect 
to received insolation. Only for the North American ice sheet is the 
upper hysteresis branch moderately inclined; that is, there is a gradual 
change between large and small equilibrium ice-sheet volumes over a 
large range of insolation forcings. For this reason, as demonstrated in 
Fig. 2b, the amplitude modulation of summer insolation variation in 
the precessional cycle, due primarily to eccentricity, is able to generate 
the 100-kyr cycles with large amplitude, gradual growth and rapid 
terminations. 


METHODS SUMMARY 


A climate parameterization, taking into account the relevant climatic factors that 
control the ice-sheet evolution, is obtained from a suite of experiments using the 
MIROC GCM. On the basis of this climate parameterization, we drive the ther- 
momechanically coupled shallow-ice-sheet model IcIES to study the impact of 
insolation and atmospheric CO, content on the change of Northern Hemisphere 
ice sheets. The experimental methods using the IcIES-MIROC model follow 
ref. 22 with a few modifications, as follows. The interaction between ice-sheet 
volume/area and the surface temperature is composed of the lapse-rate and albedo 
effects, and, as a novel term, the stationary-wave feedback, which is expressed as a 
spatial pattern of a temperature anomaly determined by GCM runs, weighted by a 
factor that depends on the ice-covered area over North America. The modifica- 
tions in the ice sheet model concern the parameterization for basal sliding over the 
sediment and hard rock, which uses a realistic map of sediment thickness; the 
calving parameterization at the margin in terms of prescribed grounding-line flux; 
and the parameters in the isostatic rebound scheme, which are optimized by using 
a coupled ice-sheet/lithosphere/asthenosphere model. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 

IcIES-MIROC model. The IcIES-MIROC model used in this study corresponds 
to the one described in ref. 22, with a few modifications explained below. The 
climate factors that control the ice-sheet changes, such as lapse rate and albedo 
feedback, are obtained from a suite of experiments using discrete GCM snapshots 
to obtain a climate parameterization*’. On the basis of this climate parameteriza- 
tion, we drive the ice-sheet model to study the impact of orbital parameters and 
atmospheric CO) content on the change of Northern Hemisphere ice sheets. 
Climate parameterization. To examine the response of climate to the orbital 
parameters, CO, and ice sheets, the atmospheric part of the atmosphere-ocean 
coupled GCM MIROC is used (K-1 developers, The University of Tokyo, 2004; 
http://ccsr.aori.u-tokyo.ac.jp/~hasumi/miroc_description.pdf). The model reso- 
lutions used in the present study are T106 (1° latitude, 1° longitude) and 20 vertical 
sigma levels with ~50-m thickness near the ice-sheet surface, and T42 with 11 
levels, as in table 1 in ref. 22. The model includes dynamical and physical processes 
such as radiative transfer and high-resolution boundary-layer physics, which are 
necessary to resolve processes crucial for modelling the ice-sheet/climate system of 
the glacial cycles. 

From the set of 18 sensitivity experiments with MIROC”, including PMIP 
(Paleoclimate Modelling Intercomparison Project) experiments”, the climatic 
effects of the changes in orbital parameters, atmospheric CO) content, lapse rate 
and surface albedo are separated and parameterized as follows: 


T; = Tref + ATso1 + ATco, at AT ice + AT nonlinear 


Here T, is the surface temperature and T,.¢is a reference temperature based on the 
present-day climatology of the European Centre for Medium-Range Weather 
Forecasts (ECMWF)/ERA-40 meteorological re-analysis data (http://www.ecmwf. 
int/research/era/do/get/era-40). The terms ATco, and AT,o; denote the changes in 
temperature according to changes in atmospheric CO, content and the change in 
temperature according to changes in insolation, respectively. The term AThontinear 
is a residual term due to other feedback effects. The effects of the atmospheric 
response to changes in ice-sheet size, AT;<e, is decomposed into three terms 


ATice = ATrapse + ATaibedo a AT swt 


where the lapse-rate effect depends on the local surface elevation, the albedo effect 
depends on the ice-sheet size and the stationary-wave feedback of the atmosphere, 
AT we. The stationary-wave feedback is prescribed in the model runs by a tem- 
perature map, where the lapse-rate effect is subtracted from the difference between 
a model experiments with full ice-sheet topography and a model experiment with 
only flat ice, but both with ice albedo. It is expressed as a product of a spatial 
pattern of the temperature anomaly and a factor, r, that depends on the ice-covered 
area over North America: 


r=max [o min E 


| 


Amax — A12 kyr. 


Here Aj2 kyr = 8.146818 X 10'* m? and Amax = 1.4 X 10? m? are the assumed ice- 
covered area over North America 12 kyr sp and at the Last Glacial Maximum, 
respectively. 

The parameterization of the effect of variable astronomical forcing and variable 
CO, follows that of ref. 22. For surface melt on the ice sheet, a positive-degree-day 
scheme following ref. 33 is applied. The astronomical forcing is based on ref. 34, 
and the CO2 forcing** is modified with revised dating”. 

IcIES ice-sheet model. The numerical ice-sheet model used in this study is the ice- 
sheet model for integrated Earth-system studies (IcIES), which is a thermomecha- 
nically coupled model in the shallow-ice approximation. The model is driven by 
surface boundary conditions such as the distributed temporal variations of climate 
in terms of surface mass balance and temperature, and by basal boundary condi- 
tions such as the bed topography, fixed geothermal heat flux and fixed sediment/ 
hard-rock distribution. Sensitivity studies on model parameters and initial conditions 


are shown in Supplementary Figs 2, 4 and 5. The model modifications compared 
with ref. 22 are as follows. 

(1) Basal sliding. The parameterization for basal sliding over the sediment and 
hard-rock follows ref. 36. The grid points are categorized as either sediment type or 
hard-rock type. A sliding law of the form u, = CH|Vh|"Vh is used, where 1, is the 
sliding velocity, H is the local ice thickness, Vh is the surface inclination and C is 
the sliding coefficient. For the sediment-type grid points we use a linear sliding law 
with C = 500 yr” | and n = 0, whereas for hard-rock-type grid points a nonlinear 
sliding law with C = 10° yr‘ and n = 2 is used. To prescribe the sediment area in 
the ice-sheet model, a global map of sediment thickness at a resolution of 1° by 1° 
provided by SEDMAP” is used. If the sediment thickness is more than 100 m, then 
sediment-type basal sliding is applied; otherwise, hard-rock-type sliding is applied. 

(2) Calving. In addition to the passive calving at the margin of the land” 
(defined by a prescribed land mask), a parameterization of active calving*® is 
implemented to represent a potential marine ice-sheet instability’. The calving 
flux (acting as grounding-line flux) at the margin is applied if a grid point satisfies 
the following three conditions: the bedrock elevation at the grid point is below sea 
level, corresponding to a marine ice-sheet situation; the surface mass balance at the 
grid point is negative, corresponding to an ablation area; and the grid faces the 
ocean, that is, at least one of the eight neighbouring grid points satisfies the floating 
condition. We apply a constant calving flux by replacing the surface ablation on 
this grid point by a fixed value (—10myr_' in the standard run). 

(3) Isostatic rebound. The dynamics of isostatic rebound is given by” 


ob 1 pi 
—|b—b _H 
at | a a 


where b, bo, H, t and p; are the transient bed elevation, the relaxed bed elevation 
without ice load, the ice thickness, time and the ice density, respectively. We 
introduce an effective mantle density of Peg = 4,500 kg m ~ anda time constant 
of t=5,000yr in the present study (Supplementary Figs 2 and 4). The high 
effective density and the time constant are optimized using the viscoelastic 
Earth model of ref. 40 coupled to the IcIES ice-sheet model". The high effective 
density compensates for the missing elastic forces in Earth’s crust, which reduce 
the total isostatic motion”. 
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Nitrogen losses in anoxic marine sediments driven by 
Thioploca-anammox bacterial consortia 


M.G. Prokopenko?, M. B. Hirst?, L. De Brabandere’, D. J. P. Lawrence”, W. M. Berelson’, J. Granger®, B. X. Chang’, S. Dawson’, 
E. J. Crane IID, L. Chong’, B. Thamdrup*, A. Townsend-Small® & D. M. Sigman’ 


Ninety per cent of marine organic matter burial occurs in contin- 
ental margin sediments, where a substantial fraction of organic car- 
bon escapes oxidation and enters long-term geologic storage within 
sedimentary rocks. In such environments, microbial metabolism is 
limited by the diffusive supply of electron acceptors. One strategy to 
optimize energy yields in a resource-limited habitat is symbiotic 
metabolite exchange among microbial associations'’’. Thermo- 
dynamic and geochemical considerations indicate that microbial co- 
metabolisms are likely to play a critical part in sedimentary organic 
carbon cycling*®. Yet only one association, between methanotrophic 
archaea and sulphate-reducing bacteria, has been demonstrated in 
marine sediments in situ’, and little is known of the role of microbial 
symbiotic interactions in other sedimentary biogeochemical cycles*. 
Here we report in situ molecular and incubation-based evidence 
for a novel symbiotic consortium between two chemolithotrophic 
bacteria—anaerobic ammonium-oxidizing (anammox) bacteria 
and the nitrate-sequestering sulphur-oxidizing Thioploca species— 
in anoxic sediments of the Soledad basin at the Mexican Pacific 
margin. A mass balance of benthic solute fluxes and the correspond- 
ing nitrogen isotope composition of nitrate and ammonium fluxes 
indicate that anammox bacteria rely on Thioploca species for the 
supply of metabolic substrates and account for about 57 + 21 per cent 
of the total benthic N2 production. We show that Thioploca-anammox 
symbiosis intensifies benthic fixed nitrogen losses in anoxic sediments, 
bypassing diffusion-imposed limitations by efficiently coupling the 
carbon, nitrogen and sulphur cycles. 

Interspecies metabolite exchange is an effective strategy for harvesting 
potential energy in resource-limited environments. In organic-rich mar- 
ine sediments, which are characterized by a strong deficit of electron 
acceptors, symbiotic microbial consortia probably have a dominant role 
in organic carbon (Co,g) cycling, given the capacity for such associations 
to maximize the energy yields of metabolic electron transfers**. Yet 
little is known of microbial co-metabolisms in the other biogeochemical 
cycles to which C,,, transformations are linked*. The marine nitrogen 
(N) cycle is of particular interest in this respect because symbiotic micro- 
bial associations may affect the balance of N fluxes that, through a series 
of biogeochemical feedbacks, may regulate biological carbon cycling’. 

Nitrogen is supplied to the ocean through the fixation of atmospheric 
Nz by diazotrophic bacteria. Fixed N is converted back to N, by ‘denit- 
rification’ in the broad sense of the term: the enzymatic reduction of 
nitrate coupled to the oxidation of Co,,, metals, methane, sulphide or 
ammonium (the last in a reaction known as anammox""') (Supplemen- 
tary Information section 1.1). The balance between N) fixation and 
denitrification regulates the availability of N for marine photosynthesis. 
Symbioses between primary producers and diazotrophs support Corg 
production in severely N-limited regions of the ocean'*”’. But no similar 
symbiotic associations have been demonstrated in denitrification, apart 


from a tentative report of endobiotic denitrifying bacteria in allogromiid 
foraminifera™. 

In organic-rich suboxic/anoxic sediments, denitrification is expected 
to be limited by the diffusive flux of nitrate from the overlying water '°. 
However, in sediments of low-oxygen (O2) regions within the eastern 
Pacific (Fig. la), substantial N. production occurs at depths below the 
diffusion depth of nitrate'*'*. From geochemical and isotopic observa- 
tions, a symbiotic partnership was proposed between the large motile 
bacteria Thioploca and anammox bacteria, with the net effect of enhan- 
cing the conversion of fixed N to N, within anoxic sediments'®. 

Thioploca, a chemolithotrophic sulphur-oxidizing genus of the pro- 
teobacteria family Thiotrichaceae, lives as filaments inside polysac- 
charide sheaths up to 500 1m in diameter and up to 20cm long”. 
Thioploca accumulates nitrate intracellularly to concentrations four 
orders of magnitude higher than bottom-water nitrate*®. Filaments 
glide vertically through the sediments within the sheaths, coupling 
sulphide oxidation at depth to dissimilatory nitrate reduction to ammo- 
nium (DNRA): NO; + H2S + HO SO,’~ + NH," (refs 20, 21). 
Thioploca’s metabolism also results in the production of nitrite via the 
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Figure 1 | Location and geochemistry of the Soledad basin sediments. 

a, Map of Soledad basin location (SB). Black-boxed numbers are N; fluxes in 
mmolNm *d_}; white circles, data from ref. 17; black circle, data from this 
work. Numbers on the contour lines are water-column [O}] 

at 500 m depth (based on data from World Ocean Atlas 2009; 
http://www.nodc.noaa.gov/OC5/WOA09/pr_woa09.html). The map was 
generated using Ocean Data View (http://odv.awi.de/). b, Nitrate and nitrite in 
pore waters extracted by whole-core squeezer’” and 5'°N-NO3_ (versus 
atmospheric N,) determined as described in Supplementary Information 
section 1.4. Peaks in [NO; ]and[NO,_] result from bursting of Thioploca cells 
by squeezing the sediments. 
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intracellular deposition of granular sulphur (S°): NO; +H.S—> 
NO, +S°+H,0 (refs 19, 21). Ammonium and nitrite generated by 
Thioploca are the required metabolic substrates for anammox bacteria, 
which obtain energy by oxidizing ammonium with nitrite to N,: 
NH,’ +NO, —N,+2H,O (ref. 14). Thus, Thioploca may be a 
desirable biogeochemical ‘host’ for anammox bacteria, but an asso- 
ciation between the two organisms has not been demonstrated. 

Here we present microbiological, molecular and isotopic evidence 
for the existence of a Thioploca~anammox consortium and show that 
N-based co-metabolism of the consortium drives a substantial fraction 
of fixed N loss in anoxic sediments. 

The study was conducted in the eastern tropical North Pacific 
Ocean, within Soledad basin, a 544-m-deep basin separated from the 
open ocean by a sill at 250 m (Fig. 1a). Restricted circulation in the 
basin, combined with high C,,, export from surface waters fuelled by 
coastal upwelling, results in an anoxic water column below the sill’. A 
large loss of fixed N was documented in the Soledad basin sediments in 
the form of an N,j efflux of 2.7+0.4mmolNm ~*d' across the sedi- 


ment-water interface, along with a concurrent benthic NH, efflux of 
similar magnitude, 


2.7£0.5mmolNm 7d! (ref. 17). An abundant 
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community of Thioploca'®'”3 of about (5 + 2) X 10* sheaths per square 
metre colonizes the rapidly accumulating (around 1-2mmyr ') 
organic-rich sediments (Supplementary Table 1A). A large subsurface 
pool of nitrate and nitrite, indicative of intracellular transport by 
Thioploca (Fig. 1b)’, and an increased '°N/'4N ratio in pore-water 
NH,~ (with a 5'°N-NH," of up to 20%o versus atmospheric N3), sug- 
gesting anammox activity, were found in sediments of this basin (Sup- 
plementary Fig. la and Supplementary Table 1)'*. 

Sediments and individual sheaths with filaments of Thioploca sp. were 
collected at 1 cm depth intervals for microbiological and molecular study 
(see Methods). Visually intact, vertically oriented, white Thioploca 
sheaths were present throughout the upper 6-8 cm. Below this depth, 
increasingly disintegrated sheaths were found, coincident with a sharp 
increase in pore-water H2S (Supplementary Fig. 1a). Phylogenetic analysis 
of 16S ribosomal RNA gene sequences (Supplementary Information sec- 
tion 3.4) showed that Thioploca from the Soledad basin is probably 
Thioploca araucae, recently re-classified as Candidatus Marithioploca 
araucae (see Methods). 

4',6-diamidino-2-phenylindole (DAPI) staining of Thioploca sp. 
sheaths collected live from the 4-6 cm depth interval revealed a dense 


Figure 2 | Anammox bacteria found in close 
spatial association with Thioploca sheaths 
collected in the sediments of Soledad basin. 

a, Differential interference contrast image of a 
Thioploca sp. sheath, collected from the Soledad 
sediments in the 4—-6-cm interval from multicore 
MC10. b, DAPI-stained Thioploca sp. sheath, 
highlighting the diverse morphologies of bacterial 
cells found within the Thioploca sp. sheath; the 
square defines the area shown in panels a, cande) . 
c, The AMX820 rDNA probe hybridized on the 
surface of Thioploca sp. sheaths with cells in a 
doughnut-shaped pattern, typical of anammox 
bacteria. d, Zoomed-in region of the white square 
in c, with more detailed view of doughnut-shaped 
cells, typical of anammox bacteria. e, Image overlay 
of b and ¢ false-coloured with DAPI in magenta 
and the AMX820 rDNA probe in yellow; G1-G3 
indicate the different bacterial morphologies on the 
surface of the Thioploca sp. sheath (see text). 

f, Zoomed-in region of the square in e, showing a 
close-up of the different bacterial morphologies 
(G1-G3). Images were collected using a Leica 
DMI6000 B inverted fluorescence microscope with 
differential interference contrast. Serial sections 
were acquired at 0.2-j1m intervals; vertical data 
stacks were deconvolved using the Huygens 
deconvolution software (http://www.svi.nl/ 
HuygensSoftware) and two-dimensional 
projections were created from the three- 
dimensional data sets using ImageJ. Scale bars in 
cand e are 10 um, scale bars in d and f are 5 um. 
g, Maximum-likelihood 16S rRNA gene phylogeny 
of anammox bacteria in the Soledad basin. 
Bootstrap values based on 1,000 replicates (over 
50%) are indicated at each node. Phylotypes 
identified in Soledad basin sediments are in bold 
with numbers in parentheses indicating the 
number of clones detected in each phylotype. The 
Candidatus Scalindua DNA samples labelled 
‘SolTA’ (green text) were obtained from Thioploca 
sheaths directly; the samples labelled ‘sed/thio’ 
(blue text) were from DNA extracted from 
sediments surrounding Thioploca sheaths. 
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population of microorganisms covering the sheaths (Fig. 2a-f). 
Morphologically, these cells could be broadly classified into three 
groups: two types of filamentous bacteria, 3-5 um and 10-20 um long 
(G1), cocci less than 0.5m in diameter (G2) and cells 1-2 um in 
diameter DAPI-stained in a doughnut-shaped pattern (G3) (Fig. 2e, f). 

An Alexa488 fluorescently labelled DNA probe (complementary to 16S 
rRNA), AMX820, that targets three known anammox genera (Candidatus 
Brocadia, Candidatus Kuenenia and Candidatus Scalindua) was applied 
to Thioploca sheaths (see Methods). The probe hybridized with G3- 
type cells in a doughnut-like pattern, typical of anammox bacteria 
because of a large intracellular anammoxosome devoid of genetic 
material’® (G3, Fig. 2c-f). Fluorescent in situ hybridization (FISH) with 
the anammox-specific probe and the doughnut-shaped ribosomal 
and DNA-targeted staining pattern tentatively identified the G3 cells 
(Fig. 2a-f) as Candidatus Scalindua, the only marine anammox genus 
known to hybridize with the AMX820 probe (see Methods). One other 
phylum of bacteria, the Poribacteria, exhibits a similar doughnut- 
shaped morphology, but these are primarily found in association with 
sponges or in surrounding water, mostly in the presence of dissolved 
O, (ref. 24). However, in view of the two to three mismatches between 
known Candidatus Scalindua sequences and the AMX820 probe, 
FISH-based identification of G3 cells as an anammox organism remains 
provisional and must be considered only in the context of the other 
evidence presented below (see Methods and Supplementary Infor- 
mation section 3.5 for further details on the limitations of our FISH 
results). Some of the cocci (G2, Fig. 2e and f) also hybridized with the 
AMX820 probe, although their smaller size and the limit of resolution 
in fluorescence microscopy (about 250 nm) make it difficult to identify 
the shape of the staining pattern, so that their identification as ana- 
mmox bacteria is less certain. Cells hybridizing with the AMX820 probe 
in the doughnut-shaped pattern were found in sediments surrounding 
Thioploca sheaths throughout the 0-8 cm depth interval (Supplemen- 
tary Fig. 1b). 

The presence of the anammox organisms was verified by the phylo- 
genetic analysis of DNA material extracted from Thioploca sheaths 
and sediment samples (Fig. 2g). Amplified with anammox-specific 
primers 16S rRNA gene sequences placed the Thioploca-associated 
organisms within the anammox genus Candidatus Scalindua (see 
Methods and Supplementary Information section 3.4). The majority 
of Thioploca-associated anammox operational taxonomic groups 
(OTU) grouped with Scalindua wagneri and related taxa (Fig. 2g). 
However, two OTUs grouped with Scalindua arabica, and the remain- 
ing four (including two OTUs of sediment-extracted DNA) formed 
another group that does not seem to be closely related to previously 
characterized clades of Scalindua species. Amplification from the 
DNA material of hzoA/hzoB genes that encode one of the enzymes 
central to anammox metabolism placed the Soledad clones within the 
Candidatus Scalindua genus as well (see Supplementary Information 
section 1.2 for details and the Methods for GenBank accession numbers). 

Anammox metabolic activity was documented by quantifying *’N> 
production in strictly anaerobic shipboard incubation experiments on 
"SNH," -amended sediment slurries containing a natural population 
of Thioploca species (see Methods"). Of the three sediment intervals 
examined, 2-4 cm, 6-8 cm and 16-18 cm, the upper two showed 7°N> 
production of 0.177 + 0.014 and 0.042 + 0.014 mol N; per litre per 
hour respectively (Supplementary Table 1B, Supplementary Fig. 2). 
Given that neither nitrate nor nitrite was added, the anammox activity 
must have relied entirely on nitrite or nitrate available from Thioploca. 
Assuming a linear decrease in anammox rates with depth, the 2, 
production integrated over the 0-8 cm interval amounted to N,j flux of 
0.53 +0.05mmolNm 7d‘, or about 20% of the total measured N, 
efflux of 2.7mmolNm 7d! (ref. 17). 7?N> production ceased after 
the first sampling time point (5 h of incubation; Supplementary Fig. 2), 
though neither nitrite nor ammonium was fully consumed. Therefore, 
anammox may have been inhibited by sulphide accumulating in the 
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incubations, potentially leading to underestimated incubation-based 
rates (Supplementary Information section 1.3). 

Incubation-based 2°, production confirmed anammox activity, 
but the experimental rates may have been affected by sediment manipu- 
lation. Thus, we derived additional constraints on N, production by 
Thioploca~anammox consortia using a steady-state box model of the 
flux and isotope mass balances of nitrate uptake coupled to Nz and 
NH,” efflux (Fig. 3, Supplementary Table 2, and Supplementary 
Information section 1.4). In the model, the total measured benthic 
ammonium efflux (Jtotnu,, Table 1) is comprised of two NH," sources, 
organic nitrogen (Norg) degradation (Jorgnu,) and DNRA Up.nu, = 
Jot, — Jorgnu,) (Fig. 3a). The Jorgnu, is determined by balancing 
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Figure 3 | N cycling by the Thioploca~anammox consortium and associated 
changes in 5'°N of nitrate and ammonium. a, Box model of the input and 
output N fluxes and 5'°N (versus atmospheric N>) of fixed N pools and fluxes 
measured or calculated as shown in the main text and in Supplementary 
Information section 2.2. The inputs include NO3_, either diffusing or 
transported into the sediments by Thioploca, and sedimentary Norg generating 
Jorgnu, through decomposition. NO3 is converted to N; via denitrification, or 
to NO, or NH,” via DNRA by Thioploca (J rp-nu, flux). A fraction of J rp.nH, 
flux is oxidized by anammox with NO, , generating the N, flux (the J(7__a)n, 
flux). The DNRA-generated NH, flux that remains after anammox oxidation 
Up-nu,), together with the Jorgnu, flux, comprises the benthic NH, efflux 
Viotnx, ). The ranges of values shown correspond to the end-member values of 
Ce-, anammox Nd Eyptake a8 discussed in the main text and shown in 

b. b, Calculated fractions of Nz generated by Thioploca~anammox consortia, 
Forn-a) = Jeth—a)Np/Jtotn, (Shown as percentages). The dashed lines are Fy7y_) 
values calculated for the case of marine algal C,,, as C source; solid lines are 
Fo7p-a) Values for CH, as C source. The F(7y,_,) values are obtained for a 

3° Nena range of 9.0-17.2%o and for an anammox isotope effect, Eanammox, 
between —30%o0 and —12%o. The Fi7m-a) values obtained for the sub-range of 
fanammox Of — 12%o to —20%o suggested in ref. 18 are shown within the grey-shaded 
area, excluding a subset of values greater than 100% in the upper left corner. 
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the number ofelectrons (e_) transferred to nitrate in N, production via 
denitrification/anammox (5e ) and in NH,* production via DNRA 
(8e ) to the number of electrons available from sedimentary Corg 
oxidation (see Supplementary Information section 2.1 for details). 
The benthic efflux of NH," and Nj is probably the dominant sink of 
electrons generated from the C,,, oxidation because efflux of other 
reduced compounds (methane, sulphide and Fe* */Mn**) is negligible 
in Soledad sediments’°. Thus, the balance of electrons transferred from 
Corg (ultimate e donor) to NO3 (ultimate e acceptor) is: 


Chow, — Jorgn,)(8€~) + (how,)(5e~) = Jorgnn,(Royn)Ce- (1) 


where Jtoen, (total measured N, flux) combines the N, fluxes from 
denitrification and anammox. JorgnH, (NH,~ generated through Nog 
decomposition) is stoichiometrically linked to Co,, oxidation through 
Rey» the C/N ratio of decomposing organic matter. C.- is the number 
of electrons available from C,, oxidation, determined by the Cor, 
oxidation state. In Soledad sediments, a large fraction of H,S is pro- 
duced by methane oxidation with sulphate’, so that methane (Cc 
C.- = 8) and marine algal Co. (C°, C.- = 4) are the two likely end- 
member C electron sources for nitrate reduction to Ny or NH,*. The 
calculated ranges for JorgNH. (0.4-1.0 mmol Nm *d~') and Jp-NH, 
(1.7-2.3 mmol N m * d~°’) reflect the range between these end-members 
and a Rcyy range of 6.7 to 9 (Supplementary Information section 2.1). 

With Jorgnu, and Jp-nu, fluxes calculated, the 5!°N of DNRA-derived 
NH,* (5'°Np-nu,) is determined from the isotopic mass balance, 
Jp-nu, X 8'°Np-wi, + Jorgnu, X 8 Nog = Jon, X 8'°’Ntowwn,» Where 
the Bo Norg and 35 °NtotNH, are known (Table 1 and Supplementary 
Information section 2.2). Assuming no isotopic fractionation for Norg 
decomposition in these sediments'®, the 8>N of the DNRA-derived 
NH,° pool (8'°Np.nu ,) is calculated as 20.0-23.2%o, the range reflect- 
ing the two C source end-members. 

At steady state, the 3°Nof Thioploca-generated NH, (8° Nra-nn a) 
(Table 1 and Fig. 3a) should reflect bottom-water nitrate, 35°Nbwno,> 
modified by an isotope effect associated with nitrate uptake by 
Thioploca (éyptake)s SO that 3°°Nrn-nu, a 35'°NbwNo, + Euptake: From 
measured 5'°NbwNo, of 17.2%o and an Eyptake between 0%o and —8.2%bo 
(see Supplementary information sections 1.4 and 2.2), the 8 PN ghenu, 
is 9.0-17.2%o (for the range of éuptake)- These 8 PN ienu, values are 


Table 1 | Measured and calculated benthic N fluxes 


Measured benthic N fluxes and 51°N values 


JtotNH, Total benthic NH,” flux 2.7+0.5mmolNm-2d7! 
toto Total benthic No flux 2.7+04mmolNm-2d7} 
31° Norg 8*°N of sedimentary organic N 9.2 + 0.2% 
3° °Newnos 315N of bottom water 17.2 + 0.1% 
Oz 
3° NtotH, 3815N of the total 18.1 + 1.1% 
ammonium efflux 
Euptake et isotopic effect of —4.1 + 1.2% 


sedimentary nitrate uptake 


Calculated benthic N fluxes and 87°N values 


JorgNHa Ha” efflux from Norg 0.4-1.0 mmol Nm~?2d7!* 
degradation 

Jo-NH, NH,* efflux from DNRA 1.7-2.3 mmol Nm~2d7!* 
(after anammox) 

JTheNHa Ha” flux generated 2.3-3.0mmolNm~?d7? 
by Thioploca 

Ja = H,* oxidized 0.34-1.35 mmolNm-2d7!* 

JTh-NHg = Jp-NH, Dy anammox 

3°°Np.NHy 31°N of NH4* from DNRA 20.0-23.2%o* 
(after anammox) 

3° NrheHy 315N of NH4* generated 9.0-17.2%o 


by Thioploca 


* The range refers to variable oxidation states of C,C*~ (CHy) and C° (Corg). Major nitrogen fluxes across 
the sediment-water interface and isotopic compositions (5!5N versus air No) of major fixed N pools and 
fluxes were measured as described in Supplementary Information section 1.4. JorgnH, aNd Jp.nH, were 
calculated using Supplementary equations (1) and (2). 3°°No.NHg was calculated using Supplementary 
equation (3). JrotnH, Was calculated using Supplementary equations (4) to (8). See Supplementary 
Information sections 2.1-2.2. 
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3-14%o0 lower than the 20.0-23.2%bo range of the 35'°Np-nry, which is 
consistent with partial oxidation of Jry;.yy,by the anammox cells 
observed in close association with the Thioploca filaments (Fig. 2). 

In Supplementary Information section 2.2, we show that the frac- 
tion of Thioploca-generated ammonium that is oxidized by anammox 
bacteria (f= Ts Th-nuy> where JIn= I Th-NHy = Jp-NH, is NH,* oxida- 
tion by anammox) can be determined as: 


15 15 
6° Nrnnu, — 0 Np-nx, 


15 s15 
6 NtotnH, + Eanammox — 0 Np-nu, 


i= (2) 
where €anammox is the isotope effect of anammox with respect to NH," 
consumption. The value of fis estimated for the range of &.nammox between 
—12%bo and —30%o (see Supplementary Information section 2.2 for dis- 
cussion of the chosen range). The ammonium oxidation flux, J, is 
obtained from J, =f Jrp.nu, (Table 1 and Supplementary Informa- 
tion section 2.2). Because J, represents only half of the N2 production 
by anammox (the second N atom deriving from nitrite), the fractional 
contribution of Thioploca~anammox-generated N> (Jt — ayn, ) to the total 
measured benthic N, efflux (Jroun, ) can be calculated as Fery_g) = 2 X Ja/ 
Jrotn, For the C*~ (methane), the median Jt—a)n, Was calculated to be 
15+0.6 (+s.d.) mmolNm 7d"! (Fcm-a) = 55 + 23%, Fig. 3b, solid lines); 
for C° (Ca), the median of J(7,— ayn, was 1.6 + 0.5 (+s.d.) mmol N m?d! 
(Forn-a) = 60 + 19%, Fig. 3b, dashed lines). 

To summarize the model results, about 57 + 21% of N> production is 
derived from anammox relying on Thioploca for the supply of nitrite. 
Despite the potential artefacts of the sediment incubations and simpli- 
fying assumptions made in the model (for example, steady state), the 
model-based estimate of the Thioploca-anammox N, production agrees 
within a factor of two with the *’N incubation-based anammox rates. 

Thioploca serves as a beneficial host for anammox bacteria by pro- 
viding the metabolically required substrates (as summarized in Sup- 
plementary Fig. 7 and Supplementary Information). However, whether 
the described symbiosis represents mutualism, syntrophy or commens- 
alism remains uncertain. At sulphidic boundaries, Thioploca may need 
to rapidly convert toxic HS” to elemental S°, generating nitrite (Sup- 
plementary Fig. 6). Nitrite is a potentially toxic metabolite, so active 
nitrite removal by anammox would be advantageous for Thioploca, 
benefiting both bacteria (mutualism). If Thioploca actually requires 
anammox to remove nitrite, this mutualism would be classified as a 
syntrophy. Alternatively, anammox may simply be using nitrite leaking 
from Thioploca with no benefit to Thioploca (commensalism). 

Analysis of sedimentary N fluxes shows that the Thioploca-anammox 
consortium enhances the benthic loss of fixed N that is otherwise limited 
by the diffusive supply of solutes in anoxic sediments’». Fixed N losses via 
all forms of denitrification are ultimately driven by the supply of sinking 
Corg, the fraction of primary production exported from the surface 
ocean. Fixed N loss, in turn, may act as a negative feedback on primary 
production through N limitation of phytoplankton growth’, thus stabili- 
zing the biological C cycle. 

The degree of coupling between the N and C cycles through the 
sedimentary component of this feedback can be diagnosed from the 
molar ratio of fixed N loss to total Cog oxidation, Nioss/Coxia- Globally, 
Nioss/Coxia initially increases with decreasing bottom water [O,] 
(arrow 1 in Fig. 4) and is then expected to decrease in sediments 
underlying suboxic/anoxic bottom waters owing to the diffusion limi- 
tation of the nitrate supply below the sediment-water interface’*”°”” 
(arrow 2 in Fig. 4). In the sediments of Soledad basin, total depth- 
integrated C,,, oxidation is 3.3 + 1.0 mmol C m “d_' (see Supplemen- 
tary Information section 2.3). The resulting Nioss/Coxia ratio of 0.82, 
higher than in other sedimentary environments lacking the Thioploca- 
anammox assemblages (Fig. 4 and refs 17, 18, 27-29), suggests that the 
Thioploca-anammox symbiosis acts to tighten the sedimentary 
Corg-supply/N-loss feedback. Thioploca species occurrences have been 
documented in anoxic sediments globally'***. If widespread, the 
Thioploca~anammox symbiosis may represent an important sedimentary 
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Figure 4 | Relationship between benthic Njos./Coxia fatio and bottom-water 
[O,]. Values calculated from published rates of sedimentary C,,, oxidation 
rates (Coxiq) and fixed Njoss values are given for comparison'”'*”’?, Error bars 
represent one standard error based on reported uncertainties in [O}] 
measurements and flux determinations. 


sink for fixed N and may become increasingly important in a warming, 
less oxygenated ocean. 


METHODS SUMMARY 


Multicores (SOL MC8 and SOL MC10) were collected in Soledad basin in 2009 
during a cruise on the RV New Horizon, at a depth of 544 m (station coordinates: 
25° 12.49’ N/112° 42.27' W and 25° 12.33’ N/112° 42.09’ W). 

Cores were sectioned in 1-cm intervals (upper 10 cm) or 3-cm intervals (below 
10cm to the bottom of the core). Thioploca sheaths carefully extracted from the 
sediment and sediment samples used for FISH analysis were fixed in 4% para- 
formaldehyde, embedded in polyacrylamide, and attached to coverslips. Cover- 
slips were pre-treated with a 10 mg ml solution of lysozyme at room temperature 
and subsequently at 37 °C for 20 min and incubated at 46 °C overnight with hybridi- 
zation buffer containing 10ng pl’ AMX 820 probe (see Methods). After 18h, 
coverslips were incubated with wash buffer at 46 °C for 20 min twice and mounted 
in Prolong Gold Antifade Reagent with DAPI (also see Methods). 

Partial 16S small subunit rRNA genes were amplified using anammox-specific 
primers and the hydrazine oxido-reductase genes hzoA/hzoB were amplified with 
nested primer pairs (hzoAB1F, hzoAB1R) and (hzoAB4F, hzoAB4R), or with 
(hzo888f, hzol1101r) (see Methods, Supplementary Information sections 1.2 and 
3.4 and Supplementary Table 4). 

Three cores were incubated for 24h at in situ temperature, and overlying water 
was sampled at regular time intervals for determination of 35'°NH,* and 8'°NO; 
of the corresponding fluxes (details in Supplementary Information sections 1.4 
and 3.1-3.3). 

For determination of anammox activity, sediment cores were sectioned in 2-cm 
intervals under anaerobic conditions, aliquots were mixed into a slurry with low- 
nutrient surface sea water, and after addition of "NH, label (no '"NO3/NO) was 
added), incubated for a total of 22 h. Total anammox activity was calculated based 
on the excess production of ?°N5 accounting for isotope dilution’! (see Methods). 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 

Collection of Thioploca sp. and sediments for DNA extraction and FISH. 
Multicores (SOL MC8 and SOL MC10) were collected at a depth of 544m, at 
25° 12.49’ N and 112° 42.27’ W and 25° 12.33’ N and 112° 42.09’ W respectively. 
Samples of Thioploca sheaths with filaments and sediment aliquots were collected 
in 1-cm intervals down to 10 cm depth and then every 3 cm down to the bottom of 
the core. Thioploca sp. samples were washed twice with overlying sea water and 
spun down at 900g (where g is the acceleration due to gravity) for 5 min at room 
temperature (about 27 °C) to remove sediment from the Thioploca sp. sheaths and 
filaments. Thioploca samples and sediments collected in 1.5-ml tubes were stored 
at —80 °C for further DNA extraction and analysis (see Supplementary Informa- 
tion section 3.4). Both types of samples (Thioploca sp. sheaths and sediments) to be 
analysed with FISH were resuspended in 400 ul of 1X Hepes buffered saline 
(HBS), and fixed with paraformaldehyde (with a final concentration of 4%) for 
30 min at room temperature while rotating. Samples were spun at 900g for 5 min at 
room temperature, and, after removing the supernatant, washed in 1 ml of PEM 
(comprising 100 mM piperazine-N,N'-bis(2-ethanesulphonic acid) (PIPES) plus 
1mM ethylene glycol tetraacetic acid (EGTA) plus 0.1mM MgSOx) twice. 
Samples were re-suspended in PEM and stored at 4 °C. 

Thioploca sp. determination. Genomic DNA was extracted from sediments 
using the MoBio PowerBioFilm DNA isolation kit. The 16S rRNA gene specific 
to Thioploca was amplified using a broad y-Proteobacteria forward primer 
(109f2), and a reverse primer specific to Thioploca spp. (829-Thioploca)*’. 
Amplification products were cloned using the TOPO TA Cloning kit for sequen- 
cing (Invitrogen). Twenty positive transformants were selected for sequencing. 
Sequences were checked for chimaeras using Mallard 1.02 (ref. 32) before phylo- 
genetic analysis. Phylogenetic analysis of 16S rRNA gene sequences (Supplemen- 
tary Information section 3.4) showed that Thioploca from the Soledad basin is 
probably Thioploca araucae™, recently re-classified as Candidatus Marithioploca 
araucae™. 

FISH using the probe AMX820. Prior to FISH, fixed Thioploca sp. (from sample 
SOL MC10, 4-6 cm) and sediment splits (from sample SOL MC8, 4-5 cm) were 
attached to slides using activated coverslips crosslinked to polyacrylamide sheets”. 
Thioploca sheaths or sediment were embedded in thin sheets of polyacrylamide gel 
that were covalently attached to activated coverslips. Two pieces of Thioploca 
sp. were placed onto activated coverslips, 25 tl of polyacrylamide solution was 
pipetted over Thioploca sp., and the droplet was flattened using a large circular 
coverslip (number 1.5, 22-mm diameter). The sandwich was polymerized for 
30 min, the circular glass coverslip removed with a razor blade, and the gel washed 
ona shaker with 50 mM HEPES pH = 8.5. 

Twenty microlitres ofa 10 mg ml solution of lysozyme (in autoclaved distilled 
water) were placed in the bottom ofa culture dish plate. Embedded Thioploca sp. 
or sediment coverslips were placed face-down on the lysozyme solution at room 
temperature for 20 min, and subsequently placed at 37 °C for 20 min. Embedded 
Thioploca sp. sheaths and sediment coverslips were washed in a 1X phosphate 
buffered saline (PBS) solution twice, and hybridized with pre-warmed hybridiza- 
tion buffer containing 5 ul of 100ng pl ' S-*-Amx-0820-a-A-22 probe labelled 
with fluorescent dye Alexa488 (AMX 820, sequence: 5’-AAAACCCCTCTACTT 
AGTGCCC-3')**, which targets the genera Candidatus Brocadia, Candidatus 
Kuenenia and Candidatus Scalindua’’, or no probe as negative control. The 
hybridization buffer contained distilled, autoclaved water, 0.9 M NaCl, 0.02 M 
Tris-HCl, pH = 8, and 0.01% SDS (sodium dodecyl sulphate) (0% formamide) 
and hybridization occurred overnight in a 46°C water bath in a humidifying 
chamber. 

Our Candidatus Scalindua sequences showed 10-12% mismatches when com- 
pared to both Candidatus Brocadia and Candidatus Kuenenia 16S rRNA gene 
sequences of homologous, conserved positions using the ARB database”* (acces- 
sion numbers of sequences used for comparison are: AF375995, CT573071, 
EU478693, PQ459989, AF375994). Furthermore, the AMX820 probe was found 
to have two mismatches with Scailindua arabica, three mismatches with S. waneri, 
and between two and three mismatches with S. brodae and S. marina. Thus, 
hybridization conditions were optimized by varying formamide concentration, 
salt concentration (wash buffer), and temperature with respect to the potential 
mismatches between the AMX820 probe and Candidatus Scalindua species. 

The following morning, embedded Thioploca filament or sediment coverslips 
were washed with wash buffer containing the following: distilled, autoclaved water, 
0.9M NaCl, 0.02 M Tris-HCl, pH = 8, and 0.01% SDS. Coverslips were washed for 
20 min at 46 °C twice and then mounted in 30 ull Prolong Gold Antifade reagent with 
DAPI. Slides were dried and stored in a cool, dry place for at least 20h before 
visualizing. Images were collected using a Leica DMI6000 B inverted fluorescence 
microscope with differential interference contrast. Serial sections were acquired at 
0.2-1m intervals. Data stacks were deconvolved using the Huygens deconvolution 
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software and two-dimensional projections were created from the three-dimensional 
data sets using ImageJ. 

Sample hybridization with the AMX820 probe and the ‘no probe’ negative 
controls were conducted on the same day and imaged simultaneously. “No probe’ 
controls revealed no autofluorescence of doughnut-shaped cells or background 
autofluorescence in the sample. At the request of a reviewer, two years after the 
main experiments were conducted, a ‘nonsense’ probe control, RPE0108Dasy, was 
hybridized with Thioploca sheaths. The ‘nonsense’ probe was labelled with the 
same amino-terminal Alexa488 fluorescent dye as was used in the original FISH 
experiment with the AMX820 probe. The hybridization conditions were identical 
to those used in the initial experiment. The ‘nonsense’ probe did not hybridize with 
Thioploca sheaths or bacterial cell walls (see Supplementary Fig. 8), indicating the 
absence of unspecific binding of the Alexa488 dye to any material in the sample 
(see Supplementary Information section 3.5 for further details on the limitations of 
the ‘nonsense’ probe experiment, including a discussion of the ageing of the 
samples). All images (including images of the ‘nonsense’ probe hybridization) 
were collected with the same fluorescence intensity, the same exposure time, 
and subject to the same deconvolution parameters using the Huygens deconvolu- 
tion software. 

Taking into account modifications in stringency of hybridization conditions and 

the overall 10-12% mismatches between the AMX 820 probe and the targeted 
sequences, identification of Thioploca-associated anammox based on rRNA-targeted 
FISH with the AMX 820 probe should be considered preliminary until it is confirmed 
with a 100% matching probe or a Clone-FISH procedure. 
PCR amplification of anammox bacteria DNA. To further verify the presence of 
anammox bacteria, DNA material was extracted from sediment samples and 
Thioploca sheaths and amplified with polymerase chain reaction (PCR) (see 
Supplementary Information section 3.4 for details on DNA extraction). Partial 
16S small subunit rRNA genes were amplified using anammox-specific primers, 
An7F” with AMX 820R*%, (2) S-P-Planc-0046-18F*° with S-*-Amx-0368-a-A-18r°° 
(Supplementary Table 4) with PCR products cloned and sequenced. Sequences with 
over 98.7% sequence identity formed 12 operational taxonomic groups that repre- 
sented broad diversity within the marine genus of anammox bacteria, Scalindua 
(Fig. 2G in the main text and Supplementary Information section 1.2). In addition, 
the hydrazine oxido-reductase genes, hzoA/hzoB, encoding one of the enzymes 
central for anammox metabolism, were amplified with nested primer pairs 
hzoAB1F, hzoAB1R; and hzoAB4F, hzoAB4R” (Supplementary Fig. 1c). Hzo-gene 
and partial 16S rRNA gene sequences have been submitted to GenBank (accession 
numbers JQ234655-672, JX945900-903, JX945905, JX945907-908, JX945910, 
JX945913, JX945915, JX945917-919, JX945921-928, JX945930 and JX945932-963). 
Sediment '°NH," incubations for determining anammox activity. Following 
protocol described in ref. 11, sediments were collected by the multicorer onboard 
the ship and immediately placed in a cold room at in situ temperature (10 °C). The 
sediment core was sliced inside a polyethylene glove bag flushed twice with Ny 
before closing. Slices collected from 2-4 cm, 6-8 cm and 16-18 cm depth intervals 
were kept inside the glove bag for "NH," amendments to measure anammox 
activity. The 2-4 cm and 6-8 cm intervals were chosen based on the observed live 
Thioploca sp. distribution, whereas the 16-18 cm interval was chosen as the lowest 
sediment depth into which Thioploca sp. visibly extended. 

Homogenized sediments were subsampled for porosity and initial ammonium 
concentrations. Concentrated ‘"NH,Cl solution (100 mmol per litre) was added to 
final '"NH,* fractions (Fx) between 0.42 and 0.62. Homogenized sediments were 
sampled for final ammonium concentrations 10 min after tracer addition to allow for 
sediment adsorption of NH,". Finally, sediments were distributed over ten 12-ml 
Exetainers (Labco limited, UK), each containing 9 ml deoxygenated artificial sea 
water (Red Sea salt solution of salinity 35.2 gkg ', or 35 practical salinity units). 

Time series incubations lasted 22 h, during which two Exetainers were sacrificed 
every 5 h (Supplementary Fig. 2). Five millilitres of sediment slurry was removed 
and replaced by a helium headspace. Microbial activity was stopped by injecting 
200 ul of a 50% (W/V) ZnCl, solution into the Exetainers. Two millilitres of 
sediment slurry was immediately injected into a He-flushed 12-ml Exetainer for 
mass spectrometric determination of *8N,, 7°N2 and *°N;. The leftover slurry was 
filtered and kept frozen for [NH,*] measurements and mass spectrometric deter- 
mination of the labelled NH," fraction in the sediment. 

Ammonium concentrations were determined using the flow injection method 
with conductivity detection’'. The fraction of '"NH,* in the NH,” pool was 
determined through hypobromite conversion of NH," to Np (refs 42, 43). Total 
anammox activity was calculated based on the excess production of "°N, and Fy 
(ref. 11). 
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A new arboreal haramiyid shows the diversity of 
crown mammals in the Jurassic period 


Xiaoting Zheng’’, Shundong Bi**, Xiaoli Wang & Jin Meng*° 


A major unsolved problem in mammalian evolution is the origin of 
Allotheria, including Multituberculata and Haramiyida’*. Multituber- 
culates are the most diverse and best known Mesozoic era mammals 
and ecologically resemble rodents, but haramiyids are known mainly 
from isolated teeth, hampering our search for their phylogenetic 
relationships. Here we report a new haramiyid from the Jurassic 
period of China, which is, to our knowledge the largest reported so 
far. It has a novel dentition, a mandible resembling advanced multi- 
tuberculates and postcranial features adapted for arboreal life. Our 
phylogenetic analysis places Haramiyida within crown Mammalia, 
suggesting the origin of crown Mammalia in the Late Triassic period 
and diversification in the Jurassic, which contrasts other estimated 
divergence times of crown Mammalia**. The new haramiyid reveals 
additional mammalian features of the group, helps to identify other 
haramiyids represented by isolated teeth, and shows again that, 
regardless of various phylogenetic scenarios, a complex pattern of 
evolution involving many convergences and/or reversals existed in 
Mesozoic mammals. 


Mammalia Linnaeus, 1758 
Allotheria Marsh, 1880 
Haramiyida Hahn, Sigogneau-Russell and Wouters, 1989 
Arboroharamiyidae gen. nov. 
Arboroharamiya jenkinsi gen. et sp. nov. 


Etymology. arbor (Latin): tree; Haramiya (Arabic): trickster, petty 
thief; jenkinsi: in honour of Farish A. Jenkins Jr for his contribution 
to the study of Mesozoic mammals, haramiyids included. 

Holotype. A partial skeleton with both mandibles associated with teeth 
and isolated upper teeth (STM33-9, Tianyu Museum of Nature, 
Shandong Province, China; Fig. 1 and Supplementary Figs 1-7). 
Locality and horizon. The holotype is from the Middle-Late Jurassic 
Tiaojishan Formation in the town of Mutoudeng, Hebei Province, China, 
dated about 160 million years (Myr)’ (Supplementary Information). 
Diagnosis. The largest known haramiyid with a body mass estimated 
at 354 g; dental formula I1?-C0-P2-M2/il-c0-p1-m2; the enlarged lower 
incisor fully covered with enamel; the only lower premolar high, triangular 


Figure 1 | The holotype specimen 
and line drawing of Arboroharamiya 
jenkinsi (STM33-9). The 
counterpart of the holotype is 
illustrated in Supplementary Fig. 1. 
ca, calcaneum; cal-17, first to 
seventeenth caudal vertebrae; ip, 
intermediate phalanges; ip2-5, 
second to fifth intermediate 
phalanges; mc2-5, second to fifth 
metacarpals; mt1-5, second to fifth 
left metatarsals; mt2—5, second to 
fifth right metatarsals; 11-7, first to 
seventh lumbar vertebrae; Ic, left 
clavicle; lfe, left femur; Ifi, left fibula; 
li, left ilium; lis, left ischium; lm, left 
mandible; Ira, left radius; Iti, left 
tibia; lu, left ulna; pp, proximal 
phalanges; pp1-5, first to fifth 
proximal phalanges; rfe, right femur; 
rfi, right fibula; ri, right ilium; rm, 
right mandible; rra, right radius; rti, 
right tibia; ru, right ulna; t, thoracic 
vertebrate; tp, terminal phalanges; 
tp1-5, first to fifth terminal 
phalanges; tr, thoracic ribs; s1-2, 
first to second sacral vertebrae; 

?, unknown element. 
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in lateral view and lacking serrations; the lower molar having a greatly 
inflated mesiolingual cusp (a1) anda central basin mesiodistally elong- 
ate and deeper distally than mesially; the upper dentition having at least 
a multi-cuspate incisor, probably two premolars and two molars; the 
distobuccal cusp (A1) the largest in the upper molars; the deep mandible 
lacking the postdentary trough and Meckelian groove but having a small 
angular process, a small coronoid process and a mandibular condyle that 
levels with the tooth row and is dorsoventrally oriented; limbs slim with 
proportionally short metapodials but long phalanges (Supplementary 
Information). 

Arboroharamiya, as with other mammals, has body hair (preserved 
as impressions), a single-boned (dentary) mandible that implies a 
three-boned middle ear, and the digit formula of 2-3-3-3-3. The den- 
tition is differentiated into incisors and multi-rooted premolars and 
molars, with the canine presumably lost. It differs from other mam- 
mals but is similar to allotherians in having two mesiodistally aligned 
rows of cusps that allow orthal (vertical) and palinal (backward), but 
not proal (forward) or transverse, jaw movement in mastication. 

Arboroharamiya is also similar to other haramiyids in having basined 
molars with cusps of uneven heights, but differs from them in being 
larger and having a more inflated al and elongate central basin surrounded 
by more cusps in lower molars. Arboroharamiya further resembles 
multituberculates in having one pair of enlarged lower incisors, a 
multi-cuspate upper incisor, loss of the canine, two upper and lower 
molars, and the mandibular with a mesially extended masseteric fossa 
and a condyle positioned low and oriented more vertically than trans- 
versely. It differs from the Jurassic multituberculates in having a small 
angular process and a highly specialized dentition with one lower 
premolar and two upper premolars and with an occlusal pattern in 
which the enlarged al bites into the basin of the upper molar. 

Previously, the only known haramiyid with the mandible and den- 
tition preserved was Haramiyavia’’, which was thought to resemble 
Morganucodon and Kuehneotherium in having the masseteric fossa 
un-extended beyond the posterior part of the last molar, the condyle 
above the level of the teeth, and presence of the postdentary trough””. 
Although Arboroharamiya is similar to Haramiyavia in having al as 
the largest cusp and the lower molar basin deeper distally than mesially 
and possessing a gracile postcranial skeleton, it is morphologically 
more advanced than Haramiyavia in lacking the canine and having 
fewer incisors, premolars and molars. Most importantly, the dentary of 
Arboroharamiya is highly specialized and considerably different from 
that of Haramiyavia, but similar to those of derived multituberculates 
such as taeniolabidids, in being short and deep and in lacking the 
postdentary trough, which is one of the characteristics previously used 
to differentiate haramiyids from multituberculates”*””. 

Assignment of tooth locus and thus identification of species for most 
haramiyids are ambiguous because only isolated teeth are preserved’. 
The discovery of Haramiyavia" helped to solve part of the puzzle. 
However, because the upper premolars of Haramiyavia are not pre- 
served, the difference of the upper premolar and molar of haramiyids 
remains unclear. In light of the new findings from Arboroharamiya 
(Fig. 2 and Supplementary Fig. 3), some haramiyid teeth identified as 
upper molars*"' are most likely premolars. Similarly, the largest cusp 
identified as b2 (mesiobuccal) in the lower molar of eleutherodontid 
haramiyids*''? is probably a1, and, if so, it would result in a different 
interpretation of the occlusal pattern in those haramiyids. 

The occlusal pattern is a critical feature that was used not only to 
identify tooth locus and species within haramiyids**”, but more impor- 
tantly to distinguish Allotheria from other Jurassic mammals that have 
transverse jaw movement during mastication. In Allotheria, the upper 
and lower molariform teeth have essentially two longitudinal rows of 
cusps, and the buccal row of the lower molar was considered to bite into 
the valley between the two rows of the upper molar, involving orthal 
and/or palinal jaw movement”. The tooth morphology and wear 
pattern of Arboroharamiya confirm the orthal and palinal jaw move- 
ments in haramiyids, but nonetheless demonstrate that it is impossible 
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Figure 2 | Teeth, mandibles and tooth occlusal relationships of 
Arboroharamiya jenkinsi. a, Occlusal views of right upper and lower incisors 
(I/i), premolars (P/p) and molars (M/m). Some of the teeth (Supplementary 
Fig. 3) have been photographically flipped in the stippling drawings. The 
general shape of p4 is similar to that of Kermackodon, one of the earliest known 
multitubuerculates’, but differs from it in lacking serrations. m1 and m2 are 
similar in having a high and inflated cusp al with cusps decreasing height 
distally. The distal end of the central basin is closed by cusps. Enamel ridges 
extend distally from cusps towards the basin, which enhance grinding as A1 of 
the upper molar ‘moves’ in the valley. The upper premolar differs from molars 
in being more rounded, with the broad central basin bearing numerous small 
cusps or crenulations. Upper molars are more mesiodistally elongate and have 
cusps Al and AS the largest and ridges extending mesially. Cusp B3 is the 
largest in cusp B-row. b, Buccal (top) and lingual (bottom) views of the 
mandible show the anterior extension of the masseteric fossa to the level below 
p4 and lack of the postdentary trough (Supplementary Fig. 2). The empty 
arrows point to the angular process. c, Line drawings illustrate the cusp 
numbering of M1 and m1 (right column) following ref. 2 and their relationship 
in occlusion (upper left). Grey arrows show the relative movements of al of m1 
and A1 of M1 (bottom left). Tooth identification, measurements and 
photographs in Supplementary Information. 


for the buccal row of the lower molar to bite into the valley of the upper 
molars. This is because the tall and inflated al at the longitudinal axis of 
ml and the distally closed central basin in lower molars prevent such 
an occlusion. The cusp shape and arrangement, wear pattern and occlusal 
match of M1 and m1 show that, during mastication, al of the lower 
molar must have bit orthally in the basin of the upper molar in the 
puncture-crushing cycle and then moved palinally within the basin in 
a grinding cycle (Fig. 2 and Supplementary Figs 3-6). In a reversed 
symmetry, A1 of the upper molar bit into the central basin of the lower 
molar and ‘moves’ mesially in the valley of the lower molar. This 
“double engaged’ occlusion prevents both proal and transverse chewing 
motion; it creates wear in the tooth basin at the distal V-notch and on 
the buccal side of A1-3, but not the lingual side of M1. It also creates 
wear on the lingual and buccal sides of al in lower molars. This occlusal 
pattern is unique among mammals and differs from what has been 
interpreted for both haramiyids and multituberculates**"’. The tooth 
morphology and occlusal pattern suggest that Arboroharamiya is 
probably either granivorous, as proposed for some haramiyids*, or 
omnivorous, as in multituberculates’. 
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Postcranial features of Arboroharamiya show adaptation for an 
arboreal life. The femur head is not spherical but cylindrical, a derived 
feature present in omomyines and tarsiers'*. Arboroharamiya has rela- 
tively short metapodials but long phalanges (Fig. 1 and Supplemen- 
tary Fig. 7), unique among early Mesozoic mammals but characteristic 
of animals with prehensile hands and feet for arboreal life, such as 
arboreal didelphids and cheirogaleid primates'® (Fig. 3). Even among 
arboreal mammals Arboroharamiya is distinctive in that both manual 
and pedal phalangeal indices, estimates for degree of prehensility of the 
hand-foot’, are clearly higher than those of extant species'®’* and 
extinct scansorial/arboreal species, such as Eomaia and Sinodelphys 
(Supplementary Figs 9 and 10). Moreover, the proximal caudal region 
is relatively long and bears expanded transverse processes. The trans- 
itional and longest caudal vertebrae are more distally positioned than 
those in nonprehensile taxa. These features are functionally related to 
the hypertrophy of the basal musculature necessary for increased gripping 
strength indicative of prehensile ability, as in some extant arboreal 
species’’”°. The postcranial morphologies of Arboroharamiya suggest 
a gracile body for arboreal habitat preference. 

The phylogenetic relationship of Allotheria remains controversial. It 
was presumed that haramiyids and multituberculates were not closely 
related’®, or they formed a clade in which multituberculates are derived 
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Figure 3 | Ternary diagrams showing intrinsic manual and pedal ray III 
proportions. Ternary plots showing relative metapodial, proximal and 
intermediate phalangeal lengths for the third digit ray of the hand and foot. The 
lengths of the third metapodial, proximal phalanx and intermediate phalanx 
are shown on their respective axes as a percentage of the combined length of the 
three segments. Compared to both fossil and extant taxa, Arboroharamiya 
jenkinsi has the intrinsic manual and pedal ray proportions typical of arboreal 
species in which the proximal and intermediate phalanges are long in respect to 
the metapodials (Fig. 1 and Supplementary Fig. 7). Abbreviations: Ar, 

A. jenkinsi; Eo, Eomaia scansoria; Je, Jeholodens jenkinsi; Ma, Maotherium 
sinensis; Sb, Sinobaatar lingyuanensis; Sd, Sinodelphys szalayi. Measurements 
and methods in part F of Supplementary Information. 
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from haramiyids, with the latter being paraphyletic”’’. Because of 
their unique dentition and early occurrences, allotherians were also 
considered to originate early from other mammals, even before mam- 
maliaformes in the Triassic*****. Our phylogenetic analysis supports 
the view that allotherians form a subgroup of the crown Mammalia*”>™* 
and that multituberculates are derived from haramiyids**' (Fig. 4). 
Because Haramiyavia"’ and several other haramiyids*"° are from the 
Upper Triassic (the Norian-Rhaetic), the age for the origin of crown 
mammals would be in the time range of 228-201.3 Myr’, younger 
than that estimated in ref. 7 but older than those in other studies**. 
With the ecological diversification recognized in early mammals”, our 
phylogeny further implies that all major clades and feeding adaptations 
of mammals had diversified during the Jurassic, coincident with inferred 
diversifications of major lineages of insects*”** and angiosperms””””. 
Owing to the fragmentary nature of most haramiyids, a thorough 
phylogenetic analysis of Allotheria remains impractical. However, 
Arboroharamiya demonstrates convincingly that haramiyids had become 
highly specialized in the Jurassic. It displays several mammalian features 
and fills some morphological gaps between Haramiyavia and multi- 
tuberculates. Although morphological characteristics support allotherians 
as a clade, Arboroharamiya shows again that homoplasy is a common 
phenomenon within Mesozoic mammals’. Some features of. Arboroharamiya, 
such as the reduced dentition—shared with advanced multituberculates— 
and elongated digits—shared with more advanced arboreal mammals— 
must be convergences. On the other hand, the dentition with multiple 
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Figure 4 | Relationship of Arboroharamiya and geological distributions of 
major groups of Mesozoic mammals and their relatives. Thin lines represent 
the phylogenetic relationships and thick lines indicate geological distributions 
of the taxa. This is a simplified consensus tree (Supplementary Fig. 11) of 

12 equally most parsimonious trees of PAUP (Phylogenetic Analysis Using 
Parsimony and Other Methods, version 4.0b), an analysis of 436 characters and 
56 taxa with a focus on Mesozoic non-therian groups (modified from ref. 23; 
parts G-I in Supplementary Information). Test analyses for alternative 
hypotheses are in part J of Supplementary Information. 
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premolars in Jurassic multituberculates has to be considered as reversed 
from the condition of Haramiyavia. Regardless of various phylogenetic 
scenarios involving allotherians*>'°****, morphological convergences 
and/or reversals were common in the early stage of mammalian evolution. 


METHODS SUMMARY 


Phylogenetic analyses were based on a data matrix consisting of 436 characters and 
56 taxa (Supplementary Information), of which 389 characters are parsimony- 
informative, and were carried out with PAUP (version 4.0b). All characters were 
unordered and equally weighted, with gaps being treated as ‘missing’ and multi- 
state taxa interpreted as polymorphism. Character-states optimized as accelerated 
transformation (ACCTRAN). 

Postcranial elements were measured using a digital caliper from 26 arboreal and 
non-arboreal extant mammals, except for Caluromys. Measurements for Caluromys 
and fossils, except for Arboroharamiya, were obtained from the literature. We 
estimated body mass using dimensions of the first lower molar, and conducted 
additional estimates based on the ulna, femur and tibia lengths, respectively. Full 
methods for body mass estimates, digit ray analyses and phylogenetic analyses are 
provided in Supplementary Information. 
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Some evolutionary innovations may originate non-adaptively as 
exaptations, or pre-adaptations, which are by-products of other 
adaptive traits'°. Examples include feathers, which originated 
before they were used in flight’, and lens crystallins, which are 
light-refracting proteins that originated as enzymes®. The question 
of how often adaptive traits have non-adaptive origins has profound 
implications for evolutionary biology, but is difficult to address 
systematically. Here we consider this issue in metabolism, one of 
the most ancient biological systems that is central to all life. We 
analyse a metabolic trait of great adaptive importance: the ability of 
a metabolic reaction network to synthesize all biomass from a single 
source of carbon and energy. We use novel computational methods 
to sample randomly many metabolic networks that can sustain life 
on any given carbon source but contain an otherwise random set of 
known biochemical reactions. We show that when we require such 
networks to be viable on one particular carbon source, they are 
typically also viable on multiple other carbon sources that were 
not targets of selection. For example, viability on glucose may entail 
viability on up to 44 other sole carbon sources. Any one adaptation 
in these metabolic systems typically entails multiple potential exap- 
tations. Metabolic systems thus contain a latent potential for evolu- 
tionary innovations with non-adaptive origins. Our observations 
suggest that many more metabolic traits may have non-adaptive 
origins than is appreciated at present. They also challenge our ability 
to distinguish adaptive from non-adaptive traits. 

How evolutionary adaptations and innovations originate is one of 
the most profound questions in evolutionary biology. Previous work'* 
emphasizes the importance of exaptations, also sometimes called pre- 
adaptations, for this origination. These are traits whose benefits to an 
organism are unrelated to the reasons for their origination; they are 
features that originally serve one (or no) function, and become later co- 
opted for a different purpose’®. Although examples of exaptations 
occur from the macroscopic scale to the molecular’® and abound also 
in human evolution’, no number of examples could reveal how import- 
ant exaptations are in the origination of adaptations in general. This 
limitation of case studies can be overcome in those biological systems 
where it is possible to study systematically many genotypes and the 
phenotypes they form*”. 

One of these systems is metabolism. The metabolic genotype of 
an organism encodes a metabolic reaction network with hundreds of 
enzyme-catalysed chemical reactions. One of metabolism’s fundamental 
tasks is to synthesize small biomass precursor molecules from environ- 
mental molecules, such as different organic carbon sources. An organ- 
ism or metabolic network is said to be viable on a carbon source if it is 
able to synthesize all biomass molecules from this source. Viability on a 
new carbon source can be an important adaptation, and anecdotal evid- 
ence shows that this ability can originate as a pre-adaptation’*"*. For 
example, laboratory evolution of Pseudomonas putida for increased 
biomass yield on xylose as a carbon source produces strains that utilize 
arabinose as efficiently as they do xylose, even though the ancestral 
strains did not utilize arabinose’*. Thus, viability on arabinose can be 


a by-product of increased viability on xylose. We here analyse system- 
atically whether such exaptations are typical or unusual in metabolic 
systems. 

Our analysis relies on the ability to predict a metabolic phenotype 
from a metabolic genotype with the constraint-based method of flux 
balance analysis (Methods), to study not just one metabolic network 
but to explore systematically a vast space of possible metabolic net- 
works. The members of this space can be described as follows. The 
currently known ‘universe’ of biochemical reactions comprises more 
than 5,000 chemical reactions with well-defined substrates and pro- 
ducts. In the metabolic network of any one organism, however, only a 
fraction of these reactions take place, enabling us to describe this net- 
work through a binary presence/absence pattern of enzyme-catalysed 
reactions in the known reaction universe. Recent methods based on 
Markov chain Monte Carlo (MCMC) sampling (Methods) allow a 
systematic exploration of this space; that is, they permit the creation 
of arbitrarily large and uniform samples of networks with a given 
phenotype’. This sampling is based on long random walks through 
metabolic network space, where each step in a walk adds or eliminates a 
metabolic reaction from a metabolic network, with the only constraint 
being that the network remains viable on a focal carbon source. The 
starting point of the MCMC random walk is the Escherichia coli meta- 
bolic network, which we know a priori to be viable on different carbon 
sources'*. Here we use this approach to create random samples of 
metabolic networks that are viable on a given set of carbon sources. 
We refer to such networks as random viable networks. 

Our analysis focuses on 50 biologically relevant and common car- 
bon sources’* (Supplementary Table 1). For each carbon source C, we 
create a sample of 500 random viable networks that are viable on C ifit 
is provided as the sole carbon source. We then use flux balance analysis 
to determine the viability of these networks on each of the 49 other 
carbon sources. This approach allows us to ask whether viability on C 
usually entails viability on other carbon sources. The answers to this 
and related questions show that potential exaptations are ubiquitous in 
metabolism. 

We began our analysis with a sample of 500 random networks that 
were viable on glucose as the sole carbon source (Methods). Each net- 
work can synthesize the 63 essential biomass precursors of E. coli— 
many of which are important for most organisms’*'°—in an aerobic 
minimal environment (Methods) containing glucose as the only carbon 
source. Importantly, we did not require that these 500 networks be 
viable on any carbon source except glucose. 

We first examined whether these networks were viable on each of 
the 49 other carbon sources. The information resulting from this ana- 
lysis can be represented, for each network, as a binary ‘innovation 
vector’ whose ith entry equals 1 if the network is viable on carbon source 
C;, and equals 0 otherwise (Fig. 1a). We define the innovation index, 
Totucoses of a network to be the number of additional carbon sources on 
which each network is viable. The distribution of this index is shown in 
Fig. 1b. Ninety-six per cent of networks are viable on other carbon 
sources in addition to glucose (Igjucose > 0). The mean innovation index 
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Figure 1 | Viability on glucose entails viability on multiple other carbon 
sources. a, The binary innovation vector of a hypothetical metabolic network 
that is viable on glucose. The vector shows that the random network is viable 
(labelled by 1) on glucose, sorbitol and fructose, but not viable (labelled by 0) on 
pyruvate and acetate. The innovation index of this network (Igiucose = 2) is the 
number of additional carbon sources on which the network is viable. b, The 
distribution of innovation indices for 500 random networks viable on glucose. 
Only 4% of networks have Igtucose = 0, meaning that they are viable only on 
glucose. 


is IGiucose = 4.86 (standard deviation, 2.83 carbon sources). This means 
that networks viable on glucose typically are also viable on almost 5 
additional carbon sources. Ninety-four networks (18.8%) are viable on 
exactly 5 new carbon sources, and 187 networks (37.4%) are viable on 
6 or more carbon sources. Viability on each such carbon source is a 
potential exaptation. This viability is merely a by-product of viability on 
glucose and could become an adaptation whenever this carbon source is 
the sole carbon source. We also found that different random viable 
networks differ in the additional carbon sources to which they are 
pre-adapted (Supplementary Figs 1 and 2). Most of the 50 carbon 
sources we study confer viability on at least one network in our sample 
(Supplementary Results). Moreover, a variation in our sampling pro- 
cedures that allows only reactions already connected to a metabolism to 
be altered further increases the incidence of exaptation (Methods and 
Supplementary Fig. 3). Finally, complex metabolic networks that have 
more reactions have greater potential for exaptation (Supplementary 
Fig. 4). 

We next asked whether the ability to grow on multiple additional 
carbon sources is a peculiarity of networks viable on glucose. To this 
end, we sampled, for each of our remaining 49 carbon sources, 500 
random metabolic networks viable on this carbon source (for a total of 
49 X 500 = 24,500 sampled networks). We then computed the distri- 
bution of the innovation index, Ic, for each carbon source C. Figure 2a 
shows the mean of this distribution (bars) and its coefficient of vari- 
ation (vertical lines), that is, the ratio of the standard deviation to the 
mean. The figure shows that glucose (highlighted in red) is not 
unusual. Eighteen carbon sources (36%) have a greater average innova- 
tion index than glucose. For example, acetate allows viability on the 
greatest number (9.75) of additional carbon sources. Conversely, some 
carbon sources, such as adenosine (adenosine = 0.27) and deoxyadeno- 
sine (Ipeoxyadenosine = 9.1), allow growth on fewer additional carbon 
sources than glucose. Carbon sources with a small average innovation 
index—entailing viability on few additional carbon sources—are also 
more variable in innovation index (Spearman’s p = —0.82,P<10— 101 
see also Supplementary Fig. 5). Even though any one carbon source may 
confer growth on only few additional carbon sources in any one network 
(Fig. 2a), when considering all networks in a sample, it may still allow 
pre-adaptation to most other carbon sources (Supplementary Fig. 6). 
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In summary, viability on any one carbon source, C, usually entails 
viability on multiple other carbon sources, whose number and identity 
can vary with C. Viability on carbon sources never before encountered 
is thus a typical metabolic property. Environmental generalists capable 
of surviving on multiple carbon sources may be viable on many more 
carbon sources than occur in their environment (Supplementary 
Tables 2 and 3 and Supplementary Fig. 10). 

We next asked whether metabolically close carbon sources show the 
highest potential for pre-adaptation. The centre path of Fig. 2b shows a 
hypothetical metabolic pathway that leads from one carbon source, C, 
to another, C,,ew (boxed area), and from there through (possibly mul- 
tiple) further metabolic reactions to the synthesis of biomass. Figure 2c 
shows the same scenario, except that C and Cyew are separated by 
several further reactions. It is possible that random networks viable 
on Care more likely to be viable also on Chew if Crew is closer to C, that 
is, if they are separated by fewer metabolic reactions, as in Fig. 2b. In 
this case, metabolite C,.y may be less easy to bypass through an 
alternative pathway that originates somewhere between C and Choy 
(right-hand sequence of arrows in Fig. 2c). 

To test this hypothesis (Supplementary Results), we analysed our 50 
samples of 500 random metabolic networks, where networks in each 
sample were required to be viable on a different one of our 50 carbon 
sources. For each sample (carbon source C) and for each of the other 49 
possible carbon sources, Cye,, we asked whether the metabolic dis- 
tance between C and C,,.,, is correlated with the fraction of networks 
that are also viable on Cyey. To answer this, we used metabolic net- 
works that were selected for growth on C and were additionally viable 
on Crew (Methods). We then computed the mean metabolic distance 
and binned the distances. The results, pooled for all networks, are 
shown on the vertical axis of Fig. 2d, whose horizontal axis shows 
the mean metabolic distance (binned into nine bins). The closer 
Cyew is to C, the more networks viable on C are also viable on Chey, 
(Spearman’s p = —0.42, P= 10 n= 1,990). However, the figure 
also shows that the association is highly noisy, especially at low meta- 
bolic distances. Taking reaction irreversibility into account yields the 
same result (Spearman’s p = —0.39, P= 10 "n= 1,601), as does a 
different way of computing distances between pairs of carbon sources 
(Methods and Supplementary Results). The association is noisy, 
because metabolism is highly reticulate (Supplementary Results). 

Although metabolic ‘nearness’ cannot explain exaptations involving 
two carbon sources, biochemical similarities help explain why a network 
viable on C might be viable on one additional carbon source, C,;, but not 
on another source, C,,. Indeed, exaptations often involve carbon sources 
with broadly defined biochemical similarities (Supplementary Figs 7 and 
8). For example, glycolytic carbon sources are more likely to entail 
exaptations for growth on other glycolytic carbon sources, and likewise 
for gluconeogenic carbon sources, as well as for carbon sources involved 
in nucleotide metabolism. Furthermore, we also find that pre-adaptation 
is synergistic; that is, the innovation index for a pair of carbon sources is 
greater than the sum of the innovation indices, I; and I-, (Supplemen- 
tary Fig. 9). 

Our analysis has several limitations. First, it is based on present 
knowledge about the reaction universe. Future work may increase 
the number of known reactions, but this would not diminish, and 
could only enhance, the spectrum of possible exaptations. The reason 
is that additional reactions would allow the use of additional carbon 
sources by some metabolic networks. Second, most of our analysis 
focused on random networks that are viable on a specific carbon 
source, but selection in the wild can affect more than viability, which 
may affect the incidence of exaptations. Of special importance is selec- 
tion that favours networks with a high rate of biomass synthesis. This 
particular selective constraint would not affect our conclusions, because 
we found that networks with high biomass synthesis rates have even 
greater potential for metabolic innovation than merely viable networks 
(Supplementary Table 4 and Supplementary Fig. 11). Third, we con- 
sidered all necessary nutrient transporters to be present (Methods). If 
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Figure 2 | Innovation varies with respect to the 
carbon source, C, and the mean metabolic 
distance between Cand C,.y. a, For each of 50 
carbon sources (horizontal axis), the figure 
indicates the mean innovation index (bar) and its 
coefficient of variation (vertical line) for 500 
random networks required to be viable on that 
carbon source. Note the broad distribution of the 
index. Some carbon sources, such as acetate, allow 
viability on more than nine additional carbon 
sources, on average, whereas others, such as 
deoxyadenosine, support viability on fewer than 
one additional carbon source. The innovation 
index of glucose (red) is typical compared with 
other carbon sources. b, A hypothetical carbon 
source, Cyew, which can be synthesized from 
another carbon source, C, in one reaction (arrow), 
and which leads, through multiple further 
reactions, to the synthesis of biomass. Some 
metabolic networks may have an alternative 
metabolic pathway that bypasses C,.,, altogether 
(right-hand sequence of arrows). ¢, Like b, but with 
Crew and C separated by multiple reactions. The 
fewer reactions separate Cand C,,.w, the more likely 
it is that Crew is not bypassed by some alternative 
metabolic pathway, and that viability on C 
therefore implies viability on C,.. d, Testing the 
hypothesis in c. The horizontal axis shows the 
mean number of reactions that separate Cand Chew 
in networks that are viable on both C and C,.y, 
binned into integer intervals according to the floor 
of this number (that is, the greatest smaller integer). 
The vertical axis shows the fraction of random 
metabolic networks required to be viable on carbon 
source C that are additionally viable on C,.y. We 
note that the potential for innovation decreases 
with increasing distance. Box edges, 25th and 75th 
percentiles; central horizontal line in each box, 
median; whiskers, +2.7 s.d.; open circles, outliers. 
Data are based on samples of 500 random viable 
networks for each of 50 carbon sources C 

(n = 25,000). 
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this is not the case, the incidence of exaptation may be reduced. In this 
regard, we note that 84% of E. coli transporters can transport multiple 
molecules’’ and that their substrate specificity can change rapidly’, 
thus ameliorating this constraint. Fourth, real metabolic networks 
may contain more reactions connected to the rest of metabolism than 
do our randomly sampled networks. However, when restricting our 
analysis to networks in which all reactions are connected, we found 
an even greater incidence of exaptation than in random networks 
(Methods, Supplementary Results and Supplementary Fig. 3). Thus, 
our results provide a lower bound on the incidence of exaptations. 
Finally, most of our analysis is based on sampling a limited number 
of 500 networks viable on each carbon source, but sampling 5,000 
random networks for select carbon sources yielded identical results 
(Supplementary Fig. 12). 

Our observations show that latent metabolic abilities are pervasive 
features of carbon metabolism. They expose non-adaptive origins of 
potentially useful carbon-source utilization traits as a universal and 
inevitable feature of metabolism. The abundance of non-adaptive trait 
origins results from the complexity of metabolic systems, which have 
many enzyme parts that can jointly form multiple metabolic phenotypes, 


6 it 8 9 


‘ance (binned) 


but this ability is not restricted to metabolic networks. Many enzymes are 
capable of using various substrates'”'®, which can further increase net- 
work complexity and the potential for exaptation. The ability to form 
multiple phenotypes also occurs in regulatory circuits”, which can form 
different patterns of molecular activity, as well as in RNA molecules”, 
which can form multiple conformations with different biological func- 
tions. Systematic analyses of genotype-phenotype relationships are 
becoming increasingly possible in such systems*”’, and already hint 
at exaptive origins of molecular traits. If confirmed in systematic ana- 
lyses like ours, the pervasiveness of non-adaptive traits may require a 
rethinking of the early origins of beneficial traits. 


METHODS SUMMARY 


We used MCMC random walks that utilize reaction swapping to sample random 
viable metabolic networks'*, and used flux balance analysis to compute the 
viability of metabolic networks during the MCMC procedure. We performed all 
analyses for minimal aerobic growth environments composed of a sole carbon 
source, along with oxygen, ammonium, inorganic phosphate, sulphate, sodium, 
potassium, cobalt, iron (Fe’* and Fe**), protons, water, molybdate, copper, calcium, 
chloride, magnesium, manganese and zinc’. 
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METHODS 


Flux balance analysis. Flux balance analysis (FBA) is a constraint-based compu- 
tational method**”* used to predict synthetic abilities and other properties of large 
metabolic networks, which are complex systems of enzyme-catalysed chemical 
reactions. FBA requires information about the stoichiometry of each molecular 
species participating in the chemical reactions of a metabolic network. This stoi- 
chiometric information is represented as a stoichiometric matrix, S, of dimensions 
m Xn, where m denotes the number of metabolites and n denotes the number of 
reactions in a network*”*. FBA also assumes that the network is in a metabolic 
steady state, such as would be attained by an exponentially growing microbial 
population in an unchanging environment. This assumption makes it possible to 
impose the constraint of mass conservation on the metabolites in the network. 
This constraint can be expressed as Sv = 0, where v denotes a vector of metabolic 
fluxes whose entries, v;, describe the rate at which reaction i proceeds. The solu- 
tions, or ‘allowable’ fluxes, of this equation form a large solution space, but not all 
of these solutions may be of biological interest. To restrict this space to fluxes of 
interest, FBA uses linear programming to maximize a biologically relevant quant- 
ity in the form of a linear objective function Z (ref. 25). Specifically, the linear 
programming formulation of an FBA problem can be expressed as 


max{Z} = max{c'v| Sv =0,a=v=b} 


The vector c contains a set of scalar coefficients that represent the maximization 
criterion, and the individual entries of vectors a and b respectively contain the 
minimal and maximal possible fluxes for each reaction in v; that is, each entry v; is 
bounded from below by a; and bounded from above by bj. 

We are here interested in predicting whether a metabolic network can sustain 
life in a given spectrum of environments, that is, whether it can synthesize all 
necessary small biomass molecules (biomass precursors) required for survival and 
growth. In a free-living bacterium such as E. coli, there are more than 60 such 
molecules, which include 20 proteinaceous amino acids, DNA and RNA nucleo- 
tide precursors, lipids and cofactors. We use the E. coli biomass composition’” to 
define the objective function and the vector c, because most molecules in E. coli’s 
biomass would be typically found in free-living organisms. We used the package 
CLP (1.4, Coin-OR; https://projects.coin-or.org/Clp) to solve the linear program- 
ming problems mentioned above. 

Chemical environments. Along with the biomass composition and stoichi- 
ometric information about a metabolic network, it is necessary to define one or 
more chemical environments that contain the nutrients needed to synthesize 
biomass precursors. Here we consider only minimal aerobic growth environments 
composed of a sole carbon source, along with oxygen, ammonium, inorganic 
phosphate, sulphate, sodium, potassium, cobalt, iron (Fe** and Fe**), protons, 
water, molybdate, copper, calcium, chloride, magnesium, manganese and zinc’. 
When studying the viability of a metabolic network in different environments, we 
vary the carbon source while keeping all other nutrients constant. When we say, 
for example, that a particular network is viable on 20 carbon sources, we mean that 
the network can synthesize all biomass precursors when each of these carbon 
sources is provided as the sole carbon source in a minimal medium. For reasons 
of computational feasibility, we restrict ourselves to 50 carbon sources (Supplemen- 
tary Table 1). They are all carbon sources on which E. coli is known to be viable from 
experiments’. We chose these carbon sources because many of them are promi- 
nent, and because they are of known biological relevance, but we emphasize that our 
observations do not otherwise make a statement about the metabolism of E. coli or 
its close relatives. They apply to metabolic networks that vary much more broadly in 
reaction composition than any relative of E. coli, because of our network sampling 
approach described below, which effectively randomizes the reaction composition 
of a microbial metabolism. 

The known reaction universe. The known reaction universe is a list of metabolic 
reactions known to occur in some organisms. For the construction of this universe, 
we used data from the LIGAND database*®”’ of the Kyoto Encyclopedia of Genes 
and Genomes”. The LIGAND database is divided into two subsets—the 
REACTION database and the COMPOUND database. These two databases 
together provide information about metabolic reactions, participating chemical 
compounds and associated stoichiometric information in an interlinked manner. 

As we described earlier'**°!, we specifically used the REACTION and 
COMPOUND databases to construct our universe of reactions while excluding 
all reactions involving polymer metabolites of unspecified numbers of monomers, 
or general polymerization reactions with uncertain stoichiometry; reactions 
involving glycans, owing to their complex structure; reactions with unbalanced 
stoichiometry; and reactions involving complex metabolites without chemical 
information”. The published E. coli metabolic model (iAF1260) consists of 
1,397 non-transport reactions'’. We merged all reactions in the E. coli model with 
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the reactions in the LIGAND database, and retained only the non-duplicate reac- 
tions. After these procedures of pruning and merging, our universe of reactions 
consisted of 5,906 non-transport reactions and 5,030 metabolites. 

Sampling of random viable metabolic networks. In an organism, a metabolic 
network can change through mutations. They can lead to addition of new reac- 
tions, by way of horizontal gene transfer, or through the evolution of enzymes with 
novel activities. They can also lead to loss of reactions through loss-of-function 
mutations in enzyme-coding genes. Natural selection can preserve those changed 
metabolic networks that are viable in a particular environment. Together, muta- 
tional processes and selection may change a metabolic network drastically on a 
long evolutionary timescale. Recent work has shown that even metabolic networks 
that differ greatly in their sets of reactions can have the same metabolic phenotype, 
that is, the same biosynthetic ability’. We here use a recently developed MCMC 
random-sampling’**°*'**** procedure to generate metabolic networks that are 
viable in specific environments, but that contain an otherwise random com- 
plement of metabolic reactions. Briefly, this procedure involves random walks 
in the space of all possible networks. During any one such random walk, a meta- 
bolic network can change through the addition and deletion of reactions. Although 
this process resembles the biological evolution of metabolic networks through 
horizontal gene transfer and (recombination-driven) gene deletions, we here use 
it for the sole purpose of creating random samples of metabolic networks from the 
space of all such networks!'***. 

In any one MCMC random walk, we keep the total number of reactions at the 
same number as in the starting E. coli network (1,397; ref. 15), to avoid artefacts 
due to varying reaction network size’’. Specifically, each mutation step in a ran- 
dom walk involves the addition of a randomly chosen reaction from the reaction 
universe, followed by the deletion of a randomly chosen metabolic reaction from 
the metabolic network. We call such a sequence of reaction addition and deletion a 
reaction swap. Reaction addition does not abolish the viability of a network in any 
environment. However, reaction deletion might. Thus, after a reaction deletion, we 
use FBA to ask whether the network is still viable, that is, whether it can synthesize 
all biomass precursors, in the specified environment. If so, we accept the deletion; 
otherwise, we reject it and choose another reaction for deletion at random, until we 
have found a deletion that retains viability. After that, we accept the reaction swap, 
thus completing a single step in the random walk. We do not subject transport 
reactions to reaction swaps. These reactions are therefore present in all networks 
generated by our random walk. 

Any MCMC random walk begins from a single starting network, in our case 
that of E. coli. The theory behind MCMC sampling’*™* shows that it is important to 
carry out as many reaction swaps as possible for MCMC to ‘erase’ the random 
walker’s similarity (‘memory’) to the initial network. The reason is that successive 
genotypes in a random walk are strongly correlated in their properties, because 
they differ by only one reaction pair. These correlations decrease as the number of 
reaction swaps increases. Because we are interested in analysing growth pheno- 
types of networks, correlations to the initial network would result in identification 
of growth on carbon sources similar to those of the starting network. In past 
work'**°, we found that for the network sizes that we use (1,397 reactions), 
3 X 10° reactions swaps are sufficient to erase the similarity of the final network 
to the starting network. To err on the side of caution, we thus carried out 5 X 10° 
reaction swaps before beginning to sample, and sampled a network every 5 X 10° 
reaction swaps thereafter. In this way, we generated samples of 500 random viable 
metabolic networks through an MCMC random walk of 2.5 X 10° reaction swaps. 
We carried out different random walks to sample networks viable on different 
carbon sources. 

For some of our analyses, we also sampled random metabolic networks of sizes 
different from that of the E. coli metabolic network. To do this, we followed a 
previously established procedure'****" to create a starting network for an MCMC 
random walk that has the desired size. This procedure first converts the known 
universe of reactions into a ‘global’ metabolic network by including the E. coli 
transport reactions in it. Not surprisingly, this global network can produce all 
biomass components and is therefore viable on all carbon sources studied here. We 
used this global network to delete successively a sequence of randomly chosen 
reactions in the following way. After each reaction deletion, FBA was used to 
determine whether the network was still viable on a given carbon source. If so, 
the deletion was accepted; otherwise, another reaction was chosen at random for 
deletion. We deleted in this way as many reactions as needed to generate a network 
of the desired size. We then used this network as the starting network for an 
MCMC random walk, as described above, to generate samples of 500 random 
viable networks. 

Identification of disconnected non-functional reactions. We performed some 
of our analysis with a version of the reaction universe that does not contain dis- 
connected reactions. Reactions that are not connected to the rest of a metabolic 
network would be non-functional, because they cannot carry a non-zero steady-state 
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metabolic flux, and thus could not contribute to the synthesis of biomass. The genes 
encoding them would eventually be lost from a genome. (We note that this loss could 
still take tens of thousands of years, given known deleterious mutation rates and 
generation times**”*, which is enough for some for other genetic or environmental 
changes to render these reactions functional.) We define a disconnected reaction as a 
reaction that does not share any one substrate or any one product with any other 
reaction in the known reaction universe. We focus here on reactions in the universe 
rather than in one metabolic network, because an individual network can gain 
additional reactions that may connect previously disconnected reactions. We note 
that even this ‘universal’ definition of disconnectedness depends on our current 
knowledge of biochemistry, as well as on the environment, because the right environ- 
ment could supply metabolites that connect previously disconnected reactions or 
pathways to the rest of a metabolic network. To identify the connected universe, we 
removed disconnected reactions. Because this removal may render other reactions 
disconnected, we repeated this process iteratively until no further reactions in the 
universe became disconnected. In this way, we found that 3,646 of the 5,906 reactions 
in the universe of reactions were connected. We used this connected universe in some 
analyses to generate network samples using the MCMC approach. 
Estimation of the metabolic distance between carbon sources. To compute the 
metabolic distance between a pair of carbon sources, C and C,ew, we used the 500 
networks selected for growth on a specific carbon source, C. We first represented a 
network as a substrate graph’’”. In this graph, vertices correspond to metabolites. 
Two metabolites (vertices) are linked by an edge if the metabolites participate in 
the same metabolic reaction, be it as an educt or as a product. We excluded 
‘currency’ metabolites from this substrate graph, which are metabolites that trans- 
fer small chemical groups and are involved in many reactions”. Specifically, we 
excluded protons, H,O, ATP (adenosine triphosphate), ADP (adenosine dipho- 
sphate), AMP (adenosine monophosphate), NADP(H) (nicotinamide adenosine 
dinucleotide diphosphate), NAD(H) (nicotinamide adenosine dinucleotide), and 
P; (inorganic phosphate), CoA (coenzyme A), hydrogen peroxide, ammonia, 
ammonium, bicarbonate, GTP (guanosine triphosphate), GDP (guanosine dipho- 
sphate), and PP; (inorganic diphosphate) that occurred in both the cytoplasmic 
and periplasmic compartments’”. In addition, we excluded oxidized and reduced 
forms of cofactors such as quinone, ubiquinone, glutathione, thioredoxin, flavo- 
doxin and flavin mononucleotide. That is, we eliminated all vertices correspond- 
ing to these metabolites when constructing the substrate graph. For each metabolic 
network, we constructed two substrate graphs: one in which the reaction irreversi- 
bility was ignored and all reactions were considered reversible, and one in which 
irreversibility was taken into account. For a network selected for growth on carbon 
source C, we calculated the shortest distance from C to each exapted carbon source, 
Crews in the substrate graph of that network, as computed by a breadth-first 
search*’. We preformed this analysis for each network in our ensemble of 500 
networks viable on C. The distance between C and Cy. was then computed as the 
mean of the metabolic distances based on networks viable on both carbon sources. 

We also computed the metabolic distance for any two carbon sources by repre- 
senting the universe of reactions as a graph in the above manner. We again 
constructed two substrate graphs, as above. Taking irreversibility into account 
increases the maximal distance to infinity because some carbon sources are con- 
nected by irreversible reactions. 
Clustering of carbon sources based on the innovation matrix. Entry [,; of the 
innovation matrix, I, represents the fraction of random metabolic networks that 
we required to be viable on carbon source C; and that was additionally viable on 
carbon source C;. To cluster the entries of this matrix, we first computed for all 
pairs of rows in this matrix the quantity d = 1 — p, where p is the Spearman rank 
correlation coefficient between the row entries. This yielded a new distance matrix 
that describes the distances between all pairs of rows. We clustered the rows of I by 
applying UPGMA (unweighted pair group method with arithmetic means“), a 
hierarchical clustering method, to the distance matrix. 

Hierarchical clustering with UPGMA classifies data such that the average dis- 
tance between elements belonging to the same cluster is lower than the average 


distance between elements belonging to different clusters'*. UPGMA identified 
two clusters of glycolytic and gluconeogenic carbon sources, and we wanted to 
know whether the distances between them were significantly different. To this end, 
we first calculated the distribution of distances d=1-— p for all pairs of row 
vectors of J within each of the two clusters. We called the resulting distance 
distribution the ‘within-cluster’ distance distribution. Similarly, we computed 
the distances between any pair of row vectors belonging to two different clusters. 
These formed a ‘between-cluster’ distance distribution. We then used the non- 
parametric Mann-Whitney U-test to check whether these two distributions were 
significantly different. 

Estimation of carbon waste production. FBA determines the maximal biomass 
yield achievable by a network for a given carbon source”. However, even when a 
network produces the maximally achievable yield, not all of the carbon input into 
the network may be converted into biomass. The non-converted carbon input 
constitutes carbon waste. Such unused carbon can be secreted in the form of one or 
more metabolites. For example, in a glucose minimal environment E. coli secretes 
carbon dioxide and acetate into the extracellular compartment as carbon waste. 
FBA estimates the amount of each metabolite secreted per unit time’*”*. To estim- 
ate the amount of carbon waste that a random network viable on glucose produces, 
we first identified the different metabolites that it secretes as waste and then 
computed the amount of carbon waste per metabolite as the product of carbon 
atoms in that metabolite and the amount of the metabolite secreted (millimoles per 
gram dry weight per hour). The total carbon waste produced by a network is com- 
puted as the sum of the above quantity for each of the secreted carbon-containing 
molecules. We repeated the above procedure for each random network in a sample 
of 500 random networks viable on glucose. We found a total of 62 metabolites that 
are secreted as waste metabolites in at least one network of our sample of networks 
viable on glucose. 

We carried out all numerical analyses using MATLAB (Mathworks Inc.). 
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The HeLa cell line was established in 1951 from cervical cancer cells 
taken from a patient, Henrietta Lacks. This was the first successful 
attempt to immortalize human-derived cells in vitro’. The robust 
growth and unrestricted distribution of HeLa cells resulted in its 
broad adoption—both intentionally and through widespread cross- 
contamination’—and for the past 60 years it has served a role analog- 
ous to that of a model organism’. The cumulative impact of the HeLa 
cell line on research is demonstrated by its occurrence in more than 
74,000 PubMed abstracts (approximately 0.3%). The genomic archi- 
tecture of HeLa remains largely unexplored beyond its karyotype’, 
partly because like many cancers, its extensive aneuploidy renders 
such analyses challenging. We carried out haplotype-resolved whole- 
genome sequencing’ of the HeLa CCL-2 strain, examined point- and 
indel-mutation variations, mapped copy-number variations and loss 
of heterozygosity regions, and phased variants across full chro- 
mosome arms. We also investigated variation and copy-number pro- 
files for HeLa S3 and eight additional strains. We find that HeLa is 
relatively stable in terms of point variation, with few new mutations 
accumulating after early passaging. Haplotype resolution facilitated 
reconstruction of an amplified, highly rearranged region of chro- 
mosome 8q24.21 at which integration of the human papilloma virus 
type 18 (HPV-18) genome occurred and that is likely to be the event 
that initiated tumorigenesis. We combined these maps with RNA-seq® 
and ENCODE Project’ data sets to phase the HeLa epigenome. This 
revealed strong, haplotype-specific activation of the proto-oncogene 
MYCby the integrated HPV-18 genome approximately 500 kilobases 
upstream, and enabled global analyses of the relationship between 
gene dosage and expression. These data provide an extensively phased, 
high-quality reference genome for past and future experiments relying 
on HeLa, and demonstrate the value of haplotype resolution for char- 
acterizing cancer genomes and epigenomes. 

We generated a haplotype-resolved genome sequence of HeLa CCL- 
2 using a multifaceted approach that included shotgun, mate-pair and 
long-read sequencing, as well as sequencing of pools of fosmid clones* 
(Supplementary Table 1). To catalogue variants, we carried out con- 
ventional shotgun sequencing to 88x non-duplicate coverage and 
reanalysed 11 control germline genomes in parallel’ (Supplementary 
Tables 2 and 3). Although normal tissue corresponding to HeLa is 
unavailable, the total number of single-nucleotide variants (SNVs) 
identified in HeLa CCL-2 (n = 4.1 X 10°) and the proportion overlap- 
ping with the 1000 Genomes Project? (90.2%) were similar to controls 
(mean n= 4.2 X 10° and 87.7%, respectively), suggesting that HeLa 
has not accumulated appreciably large numbers of somatic SNVs rela- 
tive to inherited variants. Indel variation was unremarkable after 
accounting for differences in coverage (Supplementary Fig. 1). Short 
tandem repeat profiles of HeLa also resembled controls, consistent 
with mismatch repair proficiency (Supplementary Fig. 2). 

After removing protein-altering variants that overlapped with the 
1000 Genomes Project or the Exome Sequencing Project’®, similar 
numbers of private protein-altering (PPA) SNVs were found in 


HeLa (n = 269) and controls (mean n = 391). Gene ontology analysis 
found that all terms enriched for PPA variants in HeLa (P = 0.01) were 
also enriched in at least one control (except for ‘startle response’ in 
HeLa), suggesting that known cancer-related pathways are not per- 
turbed extensively by point or indel mutations (Supplementary Fig. 3). 
Although a previous study of the HeLa transcriptome” reported an 
enrichment of putative mutations in cell-cycle- and E2F-related genes, 
subsequently generated population-scale data sets contain all variants 
that we observed in these genes, suggesting that they are inherited and 
benign rather than somatic and pathogenic. 

The overlap between PPA variants and the Catalogue of Somatic 
Mutations in Cancer (COSMIC)” was similar for HeLa (n = 1) and 
control genomes (mean n= 2.6). The gene-level overlap with the 
Sanger Cancer Gene Census (SCGC)” was also similar for HeLa 
(n = 4) and control genomes (mean n = 8.7). Canonical tumour sup- 
pressors and oncogenes were notably absent among the five SCGC 
genes with PPA variants in HeLa (BCL11B (B-cell CLL/lymphoma 11B 
(zinc finger protein)), EP300 (E1A binding protein p300), FGFR3 
(fibroblast growth factor receptor 3), NOTCH1 and PRDM16 (PR 
domain containing 16), Supplementary Tables 3-6). However, three 
are associated with HPV-mediated oncogenesis (FGFR3, EP300, 
NOTCH1) and may be ancillary to the dominant role of HPV onco- 
proteins in HeLa and other HPV~ cervical carcinomas’. Mutations in 
FGFR3 have been noted previously in cervical carcinomas, although 
infrequently and at different residues than observed here’*. Both 
EP300 and NOTCH] are recurrently mutated in diverse cancers and 
are involved in Notch signalling, a pathway that is dysregulated in 
HeLa'*. EP300, which encodes the transcriptional co-activator p300, 
interacts directly with viral oncoproteins such as HPV-16 E6 and 
HPV-16 E7 (ref. 16). Although the in-frame deletion of a highly con- 
served amino acid in EP300 seems to be somatic (heterozygous within a 
loss-of-heterozygosity (LOH) region), it is still possible that the others 
are rare, inherited variants or passenger mutations. Further studies are 
required to resolve their functional relevance and to assess whether 
these genes are recurrently altered in HPV™ cervical carcinomas. 

Aneuploidy and LOH, which are hallmarks of cancer genomes, 
were mapped in HeLa by constructing a digital copy-number profile at 
kilobase resolution (Fig. 1, Supplementary Fig. 4 and Supplementary 
Table 7). Read coverage profiles were segmented by a Hidden Markov 
Model (HMM) and recalibrated to account for widespread aneuploidy 
(Supplementary Figs 5 and 6). Sixty-one per cent of the genome has a 
baseline copy number of three, and only a small minority (3%) has a copy 
number of greater than four or less than two (Supplementary Table 8). 
LOH encompassed 15.7% of the genome, including several entire chro- 
mosome arms (5p, 6q, Xp, Xq) or large distal portions (2q, 3q, 6p, 11q, 
13q, 19p, 22q) (Supplementary Fig. 7 and Supplementary Table 9), con- 
sistent with previous descriptions of LOH in cervical carcinomas’. The 
overall profile is consistent with published karyotypes of various HeLa 
strains’, suggesting that the hypertriploid state arose either during tumo- 
rigenesis or early in the establishment of the HeLa cell line. 


1Department of Genome Sciences, University of Washington, Seattle, Washington 98115, USA. 
*These authors contributed equally to this work. 
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Structural variants were identified by clustering discordantly 
mapped reads from 40-kb and 3-kb mate-pair libraries (Supplemen- 
tary Fig. 8). Twenty interchromosomal links were identified, including 
links for marker chromosomes M11 (9q33-11p14) and M14 (13q21- 
19p13). In addition, 209 HeLa-specific deletions and 8 inversions were 
found (Supplementary Figs 9 and 11, and Supplementary Table 10). 
Only two genes that are impacted by HeLa-specific structural rearran- 
gements (Supplementary Table 11) intersected with SCGC (STK11 
(ref. 18), FHIT), both of which are recurrently deleted in cervical 
carcinomas'*””. 

Conventional whole-genome sequencing fails to resolve haplotype 
phase, an essential aspect of the description and interpretation of non- 
haploid genomes, including cancer genomes”. Recently, several 
groups have demonstrated genome-wide measurement of local or 
sparse’ haplotypes, but these approaches have yet to be applied to 
aneuploid cancer genomes. To resolve haplotype phase across the 
HeLa genome, we sequenced pools of fosmid clones’. Specifically, 
we constructed three complex fosmid-clone libraries, and then carried 
out limiting dilution and shotgun sequencing of 288 fosmid clone 
pools. In summary, these were estimated to include 518,293 individual 
non-overlapping clones with a median insert size of 33 kb, for a total 
physical coverage of 6.3X of the haploid reference genome (Sup- 
plementary Fig. 12). The complement of likely inherited heterozygous 
variants (SNP and indel, n = 1.97 X 10°) was ascertained by shotgun 
sequencing and by cross-referencing with calls made by the 1000 
Genomes Project, and then re-genotyped using reads from each clone 
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Figure 1 | Haplotype-resolved copy 
number of the HeLa cancer cell line 
genome. a, Copy-number profile of 
HeLa split by haplotypes. Links 
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locations; S, links confirmed by 
mate-pair sequencing). b, Windowed 
copy-number ratios for HeLa CCL-2 
(green and purple, alternating 
chromosomes) and HeLa S3 (grey), 
with predicted integer copy number 
for $3 (black). Notable strain 
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pool. Alleles that were present at distinct heterozygous sites within a 
given clone were assigned, or ‘phased’, to the same inherited haplotype, 
and the unobserved alleles were implicitly phased to the opposite 
haplotype. When overlapping clones from distinct pools were merged, 
this resulted in haplotype blocks with an N50 (the contig size above 
which 50% of the total length of the haplotype assembly is included) of 
550 kb containing 90.6% of heterozygous variants that were probably 
inherited. 

Most of the HeLa genome is present at an uneven haplotype ratio 
(for example, 2:1 in regions in which copy number = 3). We sought to 
exploit the resulting allelic imbalance to phase consecutive haplotype 
blocks (Supplementary Fig. 13). We first calculated the cumulative 
allelic ratio among shotgun reads for the SNVs residing in each hap- 
lotype block, which clustered closely with the underlying haplotype 
ratio. For example, in non-LOH regions with a copy number of 3 that 
have ratios of 2:1 or 1:2, allelic ratios calculated for each block had 
distributions centred on 0.32 or 0.65, close to the expected fractions of 
one-third and two-thirds (Supplementary Fig. 14). Using these ratios, 
we merged haplotype blocks into scaffolds covering 1.96 Gb or 90.3% 
of the non-LOH HeLa genome (scaffold N50 of 44.8 megabases (Mb); 
Supplementary Table 12). The haplotype-resolved scaffolds were then 
merged with the copy-number map to produce a global, haplotype- 
resolved copy-number profile of the aneuploid HeLa genome (Fig. 1a, 
Supplementary Fig. 15 and Supplementary Table 13). 

Phasing accuracy was independently confirmed by several methods. 
First, 99.7% of informative read pairs from 3-kb mate-pair sequencing 
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(each read overlapping a phased site) were concordant with the pre- 
dicted phase. Second, long-insert single-molecule sequencing (Pacific 
Biosciences RS; mean, 2.97 kb; 90th percentile, 5.1 kb among inform- 
ative reads) showed that 97.2% of reads were in perfect agreement with 
the predicted phase, despite the high per-base sequencing error rate of 
approximately 15% (Supplementary Fig. 16). Third, examination of 
allelic state across 47.3 Mb of chromosome 18q, which underwent 
LOH in HeLa $3 but not in CCL-2, showed that out of the 17,761 affec- 
ted alleles (heterozygous in CCL-2 but at an allele balance of greater 
than 0.9 among S3 reads), 99.7% corresponded to those phased 
together on haplotype A in CCL-2 (Supplementary Fig. 17). Finally, 
windowed analysis of population allele frequencies revealed probable 
African or European genetic ancestry across long stretches of the 
haplotype-resolved genome, consistent with recent admixture and a 
low switch error rate (Supplementary Figs 18 and 19). 

To measure the frequency of new mutations in the HeLa genome, we 
examined amplified haplotypes for de facto somatic mutations occur- 
ring during tumorigenesis or early in the cell line’s subsequent passaging. 
Within LOH regions, these appear as polymorphisms; 2,883 such sites 
(mean, 1.31 per haploid Mb; Supplementary Table 14) were confirmed 
by clone-pool sequencing and allele frequency in shotgun sequencing 
(Supplementary Figs 20 and 21). In non-LOH regions, in which one 
haplotype is amplified but both remain present, the majority of observed 
heterozygous sites are inherited, as reflected by their substantial overlap 
with variants from the 1000 Genomes Project (86.7%, n = 2,339,608). 
Excluding these and sites found in the 11 control genomes, 5,282 sites 
(mean, 1.32 per haploid Mb) remained at which clones differed in geno- 
type between the two or more amplified copies of the same germline 
haplotype, with little regional variation in the abundance (Supplemen- 
tary Fig. 22). In summary, 8,165 somatic mutations were validated with 
an estimated sensitivity of 61.1%, placing an upper bound on the point- 
mutational burden sustained by HeLa CCL-2 after aneuploidy. Despite 
many additional doublings in culture, this point-mutation frequency 
(2.16 per Mb) is on the lower end of frequencies observed across differ- 
ent cancer genomes”. However, without estimates for parameters such 
as the number of doublings during tumorigenesis, the count of cells 
explanted, and the number of passages in culture, this estimate of 
post-aneuploidy mutational burden cannot be rescaled to a rate per base 
per division. 

Four years after the initial establishment of the HeLa cell line, several 
additional strains were cloned’. One of these, HeLa $3, remains in 
widespread use today and has been profiled extensively as part of the 
ENCODE Project. To investigate the divergence between CCL-2 and 
S3, we carried out shotgun sequencing of $3 to 26 coverage. Outside 
of S3-specific regions of LOH, 94.5% of rare variants in CCL-2 were 
shared with S3 (n = 204,841 sites excluding 1000 Genomes Project and 
segmental duplications, and requiring =8X coverage in each genome; 
Supplementary Fig. 23 and Supplementary Table 15). Somatic mutations 
were also shared, though to a lesser degree: 72.4% of clone-confirmed 
somatic mutations from CCL-2 were found in S3 (n= 8,054 sites 
with =8X coverage in $3), consistent with a low rate of somatic SNV 
accumulation since the strains diverged in 1955. 

The copy-number profile of HeLa $3 broadly mirrors that of CCL-2 
(Fig. 1b, and Supplementary Figs 7 and 24) as well as eight additional 
HeLa strains that we sequenced lightly (3.5 to 4.3X). We observed 
some strain-specific differences (Supplementary Figs 25-27), consis- 
tent with previous reports of karyotypic heterogeneity both among and 
within strains. Despite some variability, a copy number of three was 
the dominant state consistently, with a median of 52% of the genome 
across the eight strains (range 38-60%), similar to its prevalence in 
CCL-2 (61%). Gains or losses of entire chromosome arms were 
observed (for example, chr18q, HeLa S3 (Fig. 1b), chr9p, CCL-13 
(Supplementary Figs 28 and 29)), but smaller amplifications and dele- 
tions were more common. These may correspond to variability in copy 
rather than in the content of marker chromosomes present, as sug- 
gested by high overall breakpoint concordance between strains (81% of 
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copy-number breakpoints within +1 Mb were present in =2 strains). 
The additional eight cell lines analysed here were identified in the 
1970s™ as products of HeLa contamination into other tissue cultures 
in the preceding two decades. Their shared set of structural abnormal- 
ities reflects their common origin from small founder populations of 
contaminating cells and reinforces the view that the structural rear- 
rangements resulting in marker chromosomes arose early and are 
variable in copy number. 

Nearly all cervical cancer is caused by human papillomavirus (HPV) 
infection. Within HeLa, a partial copy of the HPV-18 genome is inte- 
grated at a known fragile site on chromosome 8q24.21 (refs 25, 26). 
Haplotype and copy-number maps indicate that the flanking regions 
are present at copy number four, at a haplotype ratio of 3:1. To char- 
acterize the structure and copy number of the insertion, we included 
the HPV-18 genome alongside the human reference during align- 
ment of clone-pool reads. By analysing patterns of coverage from 
breakpoint-spanning fosmid clones, read-depth data and breakpoint 
sequencing, we generated a structural model for the viral integration 
(Fig. 2a, b, and Supplementary Figs 30 and 31). Two repeat structures 
(which we designate R1 and R2) consisting of the partial viral genome 
are interspersed with regions of human chromosome 8q24.21 genomic 
DNA. The viral genome is present with identical breakpoints on each 
copy of the amplified haplotype, to the exclusion of the other haplo- 
type, which remains at single copy and lacks integration-associated 
rearrangements, confirming that integration and rearrangement pre- 
ceded aneuploidy. The integrated structure contains only two-thirds of 
the complete HPV-18 genome, including full-length copies of the E6 
and E7 oncogenes necessary for telomerase activity (amplified to a 
copy number of approximately 12), but lacking a functional copy of 
E2, an inhibitor of E6 and E7 (ref. 13) (Fig. 2c). In addition, a distinct 
portion of the HPV-18 genome, amplified to a copy number of 
approximately 30 in HeLa, includes an epithelium-specific enhancer 
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Figure 2 | HeLa HPV integration locus. a, Chromosome 8 read depth 
flanking the HPV integration site (top, blue line), windowed copy-number 
ratios (purple points, shaded by segment) and integer copy states (red bars, 
middle), and corresponding segments and breakpoints (circled numbers with 
genomic coordinates, bottom). b, Proposed HPV integration structure: per- 
segment copy number (top left), non-rearranged haplotype B (copy 

number = 1, top right), rearranged haplotype A with HPV insertion (copy 
number = 3, bottom) carrying approximately 3 and 6 tandem copies of repeats 
R1 and R2, respectively. Hap, haplotype. c, The partial HPV-18 genome and 
corresponding genes (grey and blue, top) with breakpoints highlighted by 
numbered circles. For reference, the entire HPV-18 genome is shown (bottom). 
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Figure 3 | Gene expression by copy number and haplotype in HeLa $3. 

a, Transcript abundance (reads per kilobase per million (RPKM), for genes with 
an RPKM =1) is positively correlated with gene copy. b, Expression per copy 
(RPKM per gene copy number) does not correlate with copy number. 

c, Fractional contribution of haplotype A to overall expression (Hap A/total) 
(RPKM averaged across megabase windows at phased sites) split by 


that controls £6 and £7 transcription”, possibly contributing to their 
high expression (Supplementary Fig. 32). 

Extensive sequencing-based functional genomic data have been 
generated on HeLa and other cancer cell lines by the ENCODE 
Project’, but these have the potential to be misinterpreted if their 
analysis does not account for aneuploidy and phase. As HeLa CCL-2 
and S3 are nearly identical in genotype, we used haplotype and copy- 
number maps of CCL-2 to assign phase to publicly available functional 
data generated on S3 (ref. 7), including transcription-factor binding, 
chromatin modification and chromatin-accessibility data sets. We also 
calculated haplotype-specific gene-expression scores using RNA 
sequencing (RNA-seq) data generated in this study and by others®” 
(Supplementary Figs 33-35). For each data set, aligned reads were 
phased by comparison to HeLa CCL-2 haplotype blocks. Corres- 
ponding peak scores (chromatin immunoprecipitation followed by 
high-throughput sequencing (ChIP-seq) and DNase I sequencing 
(DNase-seq)) or gene-expression values (RNA-Seq) called from the 
full set of reads were divided proportionally based on the abundance of 
phase-informative mapping to each haplotype, normalized to each 
haplotype’s estimated copy number. Mapping to the human reference 
genome imposed a slight bias, favouring the reference allele by an 
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haplotype-resolved copy number. Open circles indicate expected fractions. 

d, Haplotype-A-specific expression in HeLa S3 but not CCL-2 across 
S3-specific LOH on chr18q. e, Haplotype A fractional contribution to 
expression across the genome, colour-coded by underlying haplotype-resolved 
copy number as in ¢ (point size represents the log, total RPKM, grey boxes 
indicate HeLa $3 LOH). 


average of 1.08-fold. We constructed two HeLa-specific reference 
sequences by introducing all SNVs from each haplotype onto one or 
the other; mapping to this reference mitigated most of the bias (to 1.02- 
fold, or a 75% reduction; Supplementary Figs 36-38). 

Across the HeLa genome, gene expression is significantly correlated 
with copy number (P = 0.075; Fig. 3a, b), suggesting a minimal role for 
gene-dosage buffering. Moreover, on average, each haplotype copy 
makes a comparable contribution to the transcriptome, despite uneven 
amplification and, in some cases, rearrangement (Fig. 3c, e). This trend 
is also observed for histone modifications, DNase hypersensitivity and 
transcription factor binding (Supplementary Figs 39 and 40). 
Transcript allele balances at sites heterozygous in CCL-2 on chro- 
mosome 18q closely followed the genomic balance (mean 66% repres- 
entation of the A allele (two-thirds was expected)), but S3 nearly 
exclusively matched the A allele (94% of reads), reflecting the S3- 
specific LOH event (Fig. 3d). However, a small number of regions 
showed strong imbalances between each haplotype’s contribution to 
overall patterns of expression, chromatin modification and transcrip- 
tion-factor binding (2.4% of ENCODE peaks, excluding those in LOH 
regions; Supplementary Figs 41-44). Interestingly, the HPV-18 inser- 
tion locus and proto-oncogene MYC (separated by approximately 
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Figure 4 | Haplotype-specific regulation near the HPV integration site. 

a, Long-range chromatin interactions between the HPV and MYC loci 
demonstrated by ChIA-PET” with the RNA polymerase II signal (top) shown 
for HeLa $3 and an HPV cell line (K562). Chromatin interactions (middle) 
are indicated by a green arrow. Bar graphs (bottom) show read counts at 
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phased, informative sites in MYC (red, haplotype A, blue, haplotype B). 

b, Transcript abundance in HeLa S3 across the MYC locus measured by RNA- 
seq. Overall coverage is shown in grey (top) with phased, informative sites 
highlighted by green ticks (pink text, non-reference alleles). Haplotype 
contributions at each variant are shown in bar graphs (bottom), as in a. 
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500 kb) were among the regions with the most highly haplotype- 
imbalanced regulation in the genome (Supplementary Fig. 45). 
Phased RNA-seq data indicate that MYC is highly expressed, but 
almost exclusively from the HPV-18-integrated haplotype (mean 
ratio, 95:1; Fig. 4b and Supplementary Fig. 46). Phased ENCODE 
tracks and long-range chromatin interaction data (ChIA-PET (chro- 
matin interaction analysis with paired-end tag sequencing)”; Fig. 4a 
and Supplementary Fig. 47) across the region indicate that transcrip- 
tion-factor occupancy, active chromatin marks and long-distance 
physical contacts are also nearly exclusive to the HPV-integrated, 
transcriptionally active haplotype. Taken together, these data implic- 
ate viral integration as a strong activator of MYC expression”, acting in 
cis rather than in trans and possibly mediated by the epithelium-spe- 
cific viral enhancer amplified to a copy number of approximately 30 
within the R1 repeat structure (Fig. 2b)””. This strong cis interaction— 
between the amplified, integrated genome of a DNA tumour virus and 
a canonical proto-oncogene—may underlie the robust growth char- 
acteristics of the HeLa cell line, and provides indirect support for the 
hypothesis that inherited risk loci for cancer at chromosome 8q24 
operate through activation of MYC*. 

In summary, we present a haplotype-resolved genome and a 
haplotype-resolved epigenome of a human cancer. Our study not only 
provides an overdue genomic analysis of the human cell line that is 
possibly the most commonly used in biomedical research but also 
represents a unique view into a cancer genome and epigenome enabled 
by the acquisition of haplotype information. 


METHODS SUMMARY 


Cells were maintained at 37 °C in DMEM F-12. Shotgun libraries were prepared by 
conventional ligation-based methods, sequenced on an Illumina HiSeq 2000 
instrument. Point variants were called using shotgun sequence reads. Copy- 
number maps were created from read depth. Long-insert clone-dilution pools 
were created and analysed as described previously’. Data sets used for each analysis 
are depicted as a flow chart in Supplementary Fig. 48. Full methods and associated 
references can be found in the online version of the paper and in Supplementary 
Notes 1-23. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 

HeLa cell culture. HeLa cell cultures (HeLa ATCC, CCL-2 (laboratory stock); 
HeLa S3 ATCC, CCL-2.2 (laboratory stock); Chang liver ATCC, CCL-13; L132 
ATCC, CCL-5; KB ATCC, CCL-17; HEp-2 ATCC, CCL-23; WISH ATCC, CCL- 
25; Intestine 407 ATCC, CCL-6; FL ATCC, CCL-62; AV-3 ATCC, CCL-21) were 
maintained in DMEM F-12, HEPES (Gibco) media supplemented with fetal bovine 
serum (FBS) to 10% and a 1X final concentration of pen-strep antibiotic (Gibco). 
Shotgun sequencing, alignment and variant calling. All shotgun libraries were 
constructed using standard ligation chemistry methods and sequenced on an 
Illumina HiSeq 2000. Reads were aligned to the human reference genome (hg19, 
b37) using BWA”! followed by duplicate removal, quality score recalibration and 
local indel realignment using GATK*’. SNVs were called using samtools”, indel 
variants were called using GATK” and short tandem repeats (STRs) were called 
using LobSTR** (Supplementary Note 1). Indel detection as a function of coverage 
was investigated further as described in Supplementary Note 2. Gene ontology 
term analysis was carried out using DAVID”. Data sets used for each analysis are 
depicted as a flow chart in Supplementary Fig. 48. 

Read depth copy number analysis. Shotgun reads for HeLa and Human Genome 
Diversity Project (HGDP) control genomes* along with a similarly prepared control 
library with a matched G + C profile were aligned using mrsFAST”, processed as 
described previously’ to generate read depth-based copy number predictions 
within non-overlapping windows of singly unique nucleotide k-mers (SUNK win- 
dows; Supplementary Note 3). Copy-number calling in HeLa was carried out at 
high (approximately 1.5-kb) and low (approximately 77-kb) resolution using an 
HMM (Supplementary Note 4), and a recalibration process was then used to 
account for widespread aneuploidy (Supplementary Note 5). Short amplifications 
and deletions were identified using a sliding-window approach (Supplementary 
Note 6). Copy-number calling was also carried out on HeLa S3 at both high and low 
resolutions, as well as on the eight additional HeLa strains at low resolution, and 
profiles were compared between strains (Supplementary Note 7). Regions of LOH 
were identified using a two-state HMM that used the fraction of homozygous SNVs 
in non-repetitive regions across low-resolution copy-number windows described 
above (Supplementary Note 8). 

Mate-pair library construction, sequencing and analysis. Library construction 
for 40-kb mate-pair libraries was carried out starting with fosmid clone DNA pooled 
within each original fosmid preparation, using a protocol similar to one described 
previously** (Supplementary Note 9). Libraries of approximately 3-kb inserts were 
constructed following protocols described previously” (Supplementary Note 9). 
After read trimming and alignment, reads were split into classes based on aligned 
orientation and insert size, and processed using sliding windows to identify regions 
of probable structural rearrangement (Supplementary Note 10). 

Fosmid pool construction, sequencing and haplotype phasing. Three replicate 
fosmid libraries were prepared as described previously’, and then partitioned by 
limited dilution into 96 sub-libraries. This was followed by outgrowth, barcoded 
transposase-based library preparation*’, sequencing and alignment (Supplemen- 
tary Note 11). Clone boundaries were inferred as described previously’, and base 
calls were made at all heterozygous variant positions as ascertained from whole- 
genome shotgun sequencing. Overlapping clones were merged to consensus 
haplotype blocks using an implementation of the ReFHap algorithm*’ (Sup- 
plementary Note 12). Within the majority of the HeLa genome in which haplo- 
types are unequally amplified, adjacent blocks were merged to create scaffolds, 
using an HMM that finds the most likely phase of neighbouring blocks given their 
shotgun allele frequencies of inherited variants (those found within the 1000 
Genomes Project, Supplementary Note 12). This produced a final set of haplotype 
scaffolds with an N50 size of 44.8 Mb, which was then used in conjunction with 
copy-number calls to estimate haplotype-resolved copy number for HeLa 
(Supplementary Note 13). Haplotype scaffolds were analysed for variant popu- 
lation frequencies to investigate the ancestral origin of phased blocks (Sup- 
plementary Note 14). Finally, overall copy numbers were compared among all 
HeLa strains sequenced in this study (Supplementary Note 15). 

Long-read phase validation. Genomic DNA from HeLa CCL-2 was mechanically 
sheared using a Covaris G-tube column and standard microcentrifuge following 
the manufacturer’s instructions, and this produced a mean fragment size of 
approximately 10kb. Single-molecule real-time sequencing libraries for the 


Pacific Biosciences RS sequencer were prepared using the Pacific Biosciences 
DNA Template Prep Kit (3-10 kb), and the resulting library was sequenced across 
eight cells using a 90-min movie. Resulting base calls were aligned to the genome 
with bwasw (using parameters ‘-b5 -q2 -rl -z1’). Reads that overlapped at least two 
phased SNPs were considered, excluding those within +10 bp of an insertion or 
deletion in the alignment. 

Identification of putative post-aneuploidy mutations. We searched for candid- 
ate somatic post-aneuploidy mutations by taking the initial set of SNVs called 
from the shotgun sequencing data and filtering to remove probable germline 
variants. SNVs that were phased on a duplicated haplotype but that were poly- 
morphic between the two duplicated copies were identified. Common poly- 
morphisms and sequencing artefacts were removed by filtering against repeat 
annotations and control genomes (Supplementary Note 16). 

HPV-18 insertion characterization. The HPV-18 integration locus was charac- 
terized by aligning all fosmid libraries to a modified genome that included the 
HPV-18 reference genome as an additional chromosome. Interchromosomal read 
pairs, fosmid-pool coverage profiles, and copy-number calls were used to deter- 
mine the repeat structure of the chromosome 8q24.21-HPV- 18 integration locus. 
Polymerase-chain-reaction primers were then designed to amplify the proposed 
breakpoints, and then sequencing for base-pair resolution was carried out 
(Supplementary Note 17). 

ENCODE and RNA-seq phasing. Directional, PolyA* RNA-seq data generated 
in-house on HeLa S3 (Supplementary Note 18) were analysed in parallel with 
publically available ENCODE epigenomics and transcriptomics data downloaded 
from the online data portal for HeLa $3, and RNA-seq data on HeLa CCL-2 (ref. 6) 
(Supplementary Note 19). RNA-seq reads were aligned using TopHat* and trans- 
cript quantification was carried out using Cufflinks*. Haplotype phasing was 
performed by genotyping aligned-sequence data for all phased SNVs and assign- 
ing haplotype contributions to either peaks (epigenomics data sets) or RPKM 
(RNA-seq data sets), and then carrying out copy-number normalization 
(Supplementary Note 20). Reference bias was investigated in all tracks and 
removed in a subset to identify its impact on outlier calling (Supplementary 
Note 21). Haplotype-specific peaks were then identified in all data tracks 
(Supplementary Note 22). Finally, a meta-analysis of all data tracks was used to 
identify large regions of haplotype imbalance (Supplementary Note 23). 
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The extraction of directional motion information from changing 
retinal images is one of the earliest and most important processing 
steps in any visual system. In the fly optic lobe, two parallel process- 
ing streams have been anatomically described, leading from two 
first-order interneurons, L1 and L2, via T4 and T5 cells onto large, 
wide-field motion-sensitive interneurons of the lobula plate’. There- 
fore, T4 and T5 cells are thought to have a pivotal role in motion 
processing; however, owing to their small size, it is difficult to 
obtain electrical recordings of T4 and T5 cells, leaving their visual 
response properties largely unknown. We circumvent this problem 
by means of optical recording from these cells in Drosophila, using 
the genetically encoded calcium indicator GCaMP5 (ref. 2). Here we 
find that specific subpopulations of T4 and T5 cells are directionally 
tuned to one of the four cardinal directions; that is, front-to-back, 
back-to-front, upwards and downwards. Depending on their pre- 
ferred direction, T4 and T5 cells terminate in specific sublayers of 
the lobula plate. T4 and T5 functionally segregate with respect to 
contrast polarity: whereas T4 cells selectively respond to moving 
brightness increments (ON edges), T5 cells only respond to moving 
brightness decrements (OFF edges). When the output from T4 or 
T5 cells is blocked, the responses of postsynaptic lobula plate 
neurons to moving ON (T4 block) or OFF edges (T5 block) are 
selectively compromised. The same effects are seen in turning res- 
ponses of tethered walking flies. Thus, starting with L1 and L2, the 
visual input is split into separate ON and OFF pathways, and 
motion along all four cardinal directions is computed separately 
within each pathway. The output of these eight different motion 
detectors is then sorted such that ON (T4) and OFF (T5) motion 
detectors with the same directional tuning converge in the same 
layer of the lobula plate, jointly providing the input to downstream 
circuits and motion-driven behaviours. 

Most of the neurons in the fly brain are dedicated to image processing. 
The respective part of the head ganglion, called the optic lobe, consists of 
several layers of neuropile called lamina, medulla, lobula and lobula plate, 
all built from repetitive columns arranged in a retinotopic way (Fig. 1a). 
Each column houses a set of identified neurons that, on the basis of Golgi 
staining, have been described anatomically in great detail*°. Owing to 
their small size, however, most of these columnar neurons have never 
been recorded from electrophysiologically. Therefore, their specific func- 
tional role in visual processing is still largely unknown. This fact is con- 
trasted by rather detailed functional models about visual processing 
inferred from behavioural studies and recordings from the large, electro- 
physiologically accessible output neurons of the fly lobula plate (tangen- 
tial cells). As the most prominent example of such models, the Reichardt 
detector derives directional motion information from primary sensory 
signals by multiplying the output from adjacent photoreceptors after 
asymmetric temporal filtering®. This model makes a number of rather 
counter-intuitive predictions all of which have been confirmed experi- 
mentally (for review, see ref. 7). Yet, the neurons corresponding to most 


c 
Inner 


chiasm ie 

5 
one 

x, 


Lobula§ dzobula 
plate 


Proximal 
Lobula plate 


Lobula 


Figure 1 | Directional tuning and layer-specific projection of T4 and T5 
cells. a, Schematic diagram of the fly optic lobe. In the lobula plate, motion- 
sensitive tangential cells extend their large dendrites over many hundreds of 
columns. Shown are the reconstructions of the three cells of the horizontal 
system”. b, Anatomy of T4 and T5 cells, as drawn from Golgi-impregnated 
material (from ref. 5). c, Confocal image of the Gal4-driver line R42F06, shown 
in a horizontal cross-section (from ref. 10). Neurons are marked in green 
(Kir2.1-EGFP labelled), whereas the neuropile is stained in purple by an 
antibody against the postsynaptic protein Dlg. Scale bar, 20 jm. d, Two-photon 
image of the lobula plate of a fly expressing GCaMP5 under the control of the 
same driver line R42F06. Scale bar, 5 um. The size and orientation of the image 
approximately corresponds to the yellow square in c. e, Relative fluorescence 
changes (AF/F) obtained during 4-s grating motion along the four cardinal 
directions, overlaid on the greyscale image. Each motion direction leads to 
activity in a different layer. Minimum and maximum AF/F values were 0.3 and 
1.0 (horizontal motion), and 0.15 and 0.6 (vertical motion). f, Compound 
representation of the results obtained from the same set of experiments. Scale 
bar, 5 jum. Results in e and f represent the data obtained from a single fly 
averaged over four stimulus repetitions. Similar results were obtained from six 
other flies. 


1Max Planck Institute of Neurobiology, 82152 Martinsried, Germany. 2 Janelia Farm Research Campus, Ashburn, Virginia 20147, USA. 3Institute of Molecular Pathology, 1030 Vienna, Austria. +Present 


address: Institute Biology 1, Albert-Ludwigs University, 79085 Freiburg, Germany. 
*These authors contributed equally to this work. 


212 | NATURE | VOL 500 | 8 AUGUST 2013 


©2013 Macmillan Publishers Limited. All rights reserved 


of the circuit elements of the Reichardt detector have not been iden- 
tified so far. Here, we focus on a set of neurons called T4 and T5 cells 
(Fig. 1b) which, on the basis of circumstantial evidence, have long been 
speculated to be involved in motion detection’*'°. However, it is 
unclear to what extent T4 and T5 cells are directionally selective or 
whether direction selectivity is computed or enhanced within the den- 
drites of the tangential cells. Another important question concerns the 
functional separation between T4 and T5 cells; that is, whether they 
carry equivalent signals, maybe one being excitatory and the other 
inhibitory on the tangential cells, or whether they segregate into 
directional- and non-directional pathways'' or into separate ON- 
and OFF-motion channels'’*”’. 

To answer these questions, we combined Gal4-driver lines specific 
for T4 and T5 cells'* with GCaMP5 (ref. 2) and optically recorded the 
visual response properties using two-photon fluorescence microscopy”. 
In a first series of experiments, we used a driver line labelling both T4 
and T5 cells. A confocal image (Fig. 1c, modified from ref. 10) revealed 
clear labelling (in green) in the medulla (T4 cell dendrites), in the 
lobula (T5 cell dendrites), as well as in four distinct layers of the lobula 
plate, representing the terminal arborizations of the four subpopula- 
tions of both T4 and T5 cells. These four layers of the lobula plate can 
also be seen in the two-photon microscope when the calcium indicator 
GCaMPS5 is expressed (Fig. 1d). After stimulation of the fly with grating 
motion along four cardinal directions (front-to-back, back-to-front, 
upwards and downwards), activity is confined to mostly one of the four 
layers, depending on the direction in which the grating is moving 
(Fig. le). The outcome of all four stimulus conditions can be combined 
into a single image by assigning a particular colour to each pixel depend- 
ing on the stimulus direction to which it responded most strongly 
(Fig. 1f). From these experiments it is clear that the four subpopulations 
of T4 and T5 cells produce selective calcium signals depending on the 
stimulus direction, in agreement with previous deoxyglucose labelling®. 
Sudden changes of the overall luminance evokes no responses in any of 
the layers (field flicker; n = 4 experiments, data not shown). However, 
gratings flickering in counter-phase lead to layer-specific responses, 
depending on the orientation of the grating (Supplementary Fig. 1). 

The retinotopic arrangement of this input to the lobula plate is 
demonstrated by experiments where a dark edge was moved within 
a small area of the visual field only. Depending on the position of this 
area, activity of T4 and T5 cells is confined to different positions within 
the lobula plate (Fig. 2a). Consequently, when moving a bright vertical 
edge horizontally from back to front, activity of T4 and T5 cells is 
elicited sequentially in layer 2 of the lobula plate (Fig. 2b). These two 
experiments also demonstrate that T4 and T5 cells indeed signal 
motion locally. We next investigated the question of where direction 
selectivity of T4 and T5 cells arises; that is, whether it is already present 
in the dendrite, or whether it is generated by synaptic interactions 
within the lobula plate. This question is hard to answer, as the den- 
drites of both T4 and T5 cells form a dense mesh within the proximal 
layer of the medulla (T4) and the lobula (T5), respectively. However, 
signals within the inner chiasm where individual processes of T4 and 
T5 cells can be resolved in some preparations show a clear selectivity 
for motion in one over the other directions (Fig. 2c). Such signals are as 
directionally selective as the ones measured within the lobula plate, 
demonstrating that the signals delivered from the dendrites of T4 and 
T5 cells are already directionally selective. 

To assess the particular contribution of T4 and T5 cells to the signals 
observed in the above experiments, we used driver lines specific for T4 
and T5 cells, respectively. Applying the same stimulus protocol and 
data evaluation as in Fig. 1, identical results were obtained as before 
for both the T4- as well as the T5-specific driver line (Fig. 3a, b). We 
conclude that T4 and T5 cells each provide directionally selective 
signals to the lobula plate, in contrast to previous reports''. Thus, both 
T4 and T5 cells can be grouped, according to their preferred direction, 
into four subclasses covering all four cardinal directions, reminiscent 
of ON-OFF ganglion cells of the rabbit retina’®. 
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Figure 2 | Local signals of T4 and T5 cells. a, Retinotopic arrangement of T4 
and TS cells. A dark edge was moving repeatedly from front-to-back within a 
15° wide area at different azimuthal positions (left). This leads to relative 
fluorescence changes at different positions along the proximal-distal axis 
within layer 1 of the lobula plate (right). Scale bar, 5 jum. Similar results have 
been obtained in four other flies. b, Sequential activation of T4 and T5 cells. A 
bright edge was moving from back-to-front at 15° s'. Scale bar, 5 jum. Similar 
results have been obtained in six other flies. c, Signals recorded from individual 
fibres within the inner chiasm (left) reveal a high degree of direction selectivity 
(right). Scale bar, 5 tum. Similar results were obtained from four other flies, 
including both lines specific for T4 and T5 cells. Response traces in b and c are 
derived from the region of interest encircled in the image with the same colour. 


We next addressed whether T4 cells respond differently to T5 cells. 
To answer this question, we used, instead of gratings, moving edges 
with either positive (ON edge, brightness increment) or negative (OFF 
edge, brightness decrement) contrast polarity as visual stimuli. We 
found that T4 cells strongly responded to moving ON edges, but 
showed little or no response to moving OFF edges (Fig. 3c). This is 
true for T4 cells terminating in each of the four layers. We found the 
opposite for T5 cells. T5 cells selectively responded to moving OFF 
edges and mostly failed to respond to moving ON edges (Fig. 3d). 
Again, we found this for T5 cells in each of the four layers. We next 
addressed whether there are any other differences in the response 
properties between T4 and T5 cells by testing the velocity tuning of 
both cell populations by means of stimulating flies with grating motion 
along the horizontal axis from the front to the back at various velocities 
covering two orders of magnitude. T4 cells revealed a maximum res- 
ponse at a stimulus velocity of 30°s_', corresponding to a temporal 
frequency of 1 Hz (Fig. 3e). T5 cell responses showed a similar depend- 
ency on stimulus velocity, again with a peak at a temporal frequency of 
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cells. a, b, Relative fluorescence changes (AF/F) of the lobula plate terminals of 
T4 (a) and T5 (b) cells obtained during grating motion along the four cardinal 
directions. Results represent the data obtained from a single fly each, averaged 
over two stimulus repetitions. Scale bars, 5 um. Similar results have been 
obtained in ten other flies. c, d, Responses of T4 (c) and T5 (d) cells to ON and 
OFF edges moving along all four cardinal directions. ON (white) and OFF 
(black) responses within each layer are significantly different from each other, 
with P< 0.005 except for layers 3 and 4 in T5 cells, where P< 0.05. 

e, f, Responses of T4 (e) and T5 (f) cells to gratings moving horizontally at 
different temporal frequencies. Relative fluorescence changes were evaluated 
from layer 1 of the lobula plate and normalized to the maximum response 
before averaging. g, h, Responses of T4 (g) and T5 (h) cells to gratings moving 
in 12 different directions. Relative fluorescence changes were evaluated from all 
four layers of the lobula plate normalized to the maximum response before 
averaging. Data represent the mean + s.e.m. of the results obtained in n = 8 
(c),n=7 (d), n= 6 (e),n =7 (f), n= 6 (g) and n =5 (h) different flies. 
Significances indicated are based on two-sample t-test. 


1 Hz (Fig. 3f). Thus, there is no obvious difference in the velocity 
tuning between T4 and T5 cells. As another possibility, T4 cells might 
functionally differ from T5 cells with respect to their directional tuning 
width. To test this, we stimulated flies with gratings moving into 12 
different directions and evaluated the relative change of fluorescence in 
all four layers of the lobula plate. Using the T4-specific driver line, we 
found an approximate half width of 60-90° of the tuning curve, with 
the peak responses in each layer shifted by 90° (Fig. 3g). No decrease of 
calcium was detectable for grating motion opposite to the preferred 
direction of the respective layer. When we repeated the experiments 
using the T5-specific driver line, we found a similar dependence of the 
relative change of fluorescence on the stimulus direction (Fig. 3h). We 
conclude that T4 cells have the same velocity and orientation tuning as 
T5 cells. The only functional difference we were able to detect remains 
their selectivity for contrast polarity. 

Our finding about the different preference of T4 and T5 cells for the 
polarity of a moving contrast makes the strong prediction that selective 
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blockade of T4 or T5 cells should selectively compromise the responses 
of downstream lobula plate tangential cells to either ON or OFF edges. 
To test this prediction, we blocked the output of either T4 or T5 cells 
via expression of the light chain of tetanus toxin’’ and recorded the 
responses of tangential cells via somatic whole-cell patch to moving 
ON and OFF edges. In response to moving ON edges, strong and 
reliable directional responses were observed in all control flies (Fig. 4a). 
However, T4-block flies showed a strongly reduced response to ON 
edges, whereas the responses of T5-block flies were at the level of 
control flies (Fig. 4b, c). When we used moving OFF edges, control 
flies again responded with a large amplitude (Fig. 4d). However, the 
responses of T4-block flies were at the level of control flies, whereas the 
responses of T5-block flies were strongly reduced (Fig. 4e, f). These 
findings are reminiscent on the phenotypes obtained from blocking 
lamina cells L1 and L2 (ref. 13) and demonstrate that T4 and T5 cells 
are indeed the motion-coding intermediaries for these contrast polar- 
ities on their way to the tangential cells of the lobula plate. Whether the 
residual responses to ON edges in T4-block flies and to OFF edges in 
T5-block flies are due to an incomplete signal separation between the 
two pathways or due to an incomplete genetic block in both fly lines is 
currently unclear. 

To address the question of whether T4 and T5 cells are the only 
motion detectors of the fly visual system, or whether they represent 
one cell class, in parallel to other motion-sensitive elements, we used 
tethered flies walking on an air-suspended sphere’® and stimulated 
them by ON and OFF edges moving in opposite directions’’. As in 
the previous experiments, we blocked T4 and T5 cells specifically by 
selective expression of the light chain of tetanus toxin. During balanced 
motion, control flies did not show significant turning responses to 
either side (Fig. 4g). T4-block flies, however, strongly followed the 
direction of the moving OFF edges, whereas T5-block flies followed 
the direction of the moving ON edges (Fig. 4h, i). In summary, the 
selective preference of T4-block flies for OFF edges and of T5-block 
flies for ON edges not only corroborates our findings about the selec- 
tive preference of T4 and T5 cells for different contrast polarities, but 
also demonstrates that the signals of T4 and T5 cells are indeed the 
major, if not exclusive, inputs to downstream circuits and motion- 
driven behaviours. 

Almost a hundred years after T4 and T5 cells have been anato- 
mically described’, this study reports their functional properties in a 
systematic way. Using calcium as a proxy for membrane voltage”’, we 
found that both T4 and T5 cells respond to visual motion in a direc- 
tionally selective manner and provide these signals to each of the four 
layers of the lobula plate, depending on their preferred direction. Both 
cell types show identical velocity and orientation tuning which 
matches the one of the tangential cells*’’*. The strong direction selec- 
tivity of both T4 and T5 cells is unexpected, as previous studies had 
concluded that the high degree of direction selectivity of tangential 
cells is due to a push-pull configuration of weakly directional input 
with opposite preferred direction****. Furthermore, as the preferred 
direction of T4 and T5 cells matches the preferred direction of the 
tangential cells branching within corresponding layers, it is currently 
unclear which neurons are responsible for the null-direction response 
of the tangential cells. As for the functional separation between T4 and 
T5 cells, we found that T4 cells selectively respond to brightness incre- 
ments, whereas T5 cells exclusively respond to moving brightness decre- 
ments. Interestingly, parallel ON and OFF motion pathways had been 
previously postulated on the basis of selective silencing of lamina neu- 
rons L1 and L2 (ref. 13). Studies using apparent motion stimuli to 
probe the underlying computational structure arrived at controversial 
conclusions: whereas some studies concluded that there was a separate 
handling of ON and OFF events by motion detectors'*”*”*, others did 
not favour such a strict separation’’’’. The present study directly 
demonstrates the existence of separate ON and OFF motion detectors, 
as represented by T4 and T5 cells, respectively. Furthermore, our results 
anatomically confine the essential processing steps of elementary 
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Figure 4 | Voltage responses of lobula plate tangential cells and turning 
responses of walking flies to moving ON and OFF edges. a, d, Average time 
course of the membrane potential in response to preferred direction motion 
minus the response to null direction motion (PD — ND response) as recorded 
in three types of control flies (stimulation period indicated by shaded area). 
b, e, Same as in a, d, but recorded in T4-block flies (green) and T5-block flies 
(red). The stimulus pattern, shown to the left, consisted of multiple ON- (a) or 
OFF-edges (d). c, f, Mean voltage responses (PD — ND) of tangential cells in 
the five groups of flies. Recordings were done from cells of the vertical”' and the 
horizontal” system. Because no difference was detected between them, data 
were pooled. Data comprise recordings from n = 20 (TNT control), n = 12 (T4 
control), 1 = 16 (T5 control), n = 17 (T4 block) and n = 18 (T5 block) cells. In 
both T4 and T5-block flies, ON and OFF responses are significantly different 


motion detection—that is, asymmetric temporal filtering and non- 
linear interaction—to the neuropile between the axon terminals of 
lamina neurons L1 and L2 (ref. 28) and the dendrites of directionally 
selective T4 and T5 cells (Supplementary Fig. 2). The dendrites of T4 
and T5 cells might well be the place where signals from neighbouring 
columns interact in a nonlinear way, similar to the dendrites of star- 
burst amacrine cells of the vertebrate retina”. 


METHODS SUMMARY 

Flies. Flies used in calcium imaging experiments (Figs 1-3) had the following 
genotypes: T4/T5 line (w ; +/+; UAS-GCaMP5,R42F06-GAL4/UAS-GCaMP5, 
R42F06-GAL4), T4line(w ; +/+; UAS-GCaMP5,R54A03-GAL4/UAS-GCaMP5, 
R54A03-GAL4), T5 line (w_ ; +/+; UAS-GCaMP5,R42H07-GAL4/UAS-GCaMP35, 
R42H07-GAL4). Flies used in electrophysiological and behavioural experiments 
(Fig. 4) had identical genotypes of the following kind: TNT control flies (w*/w*; 
UAS-TNT-E/UAS-TNT-E; +/+), T4 control flies (w*/w_; + /+; VT37588-GAL4/ 
+), T5 control flies (w*/w”; +/+; R42H07-GAL4/+), T4-block flies (w*/w; 
UAS-TNT-E/+; VT37588-GAL4/+), T5-block flies (w/w; UAS-TNT-E/+; 
R42H07-GAL4/+). 

Two-photon microscopy. We used a custom-built two-photon laser scanning 
microscope *’ equipped with a X40 water immersion objective and a mode locked 
Ti:sapphire laser. To shield the photomultipliers from the stimulus light, two 
separate barriers were used: the first was placed directly over the LEDs, the second 
extended from the fly holder over the arena. Images were acquired at a resolution 
of 256 X 256 pixels and a frame rate of 1.87 Hz, except where indicated, using 
ScanImage software”. 


from each other with P< 0.001. In T4-block flies, ON responses are 
significantly reduced compared to all three types of control flies, whereas in T5- 
block flies, OFF responses are significantly reduced, both with P< 0.001. 

g, Average time course of the turning response of three types of control flies to 
ON and OFF edges moving simultaneously to opposite directions (stimulation 
period indicated by shaded area). h, Same as in g, but recorded from T4-block 
flies (green) and T5-block flies (red). i, Mean turning tendency (+s.e.m.) 
during the last second of the stimulation period averaged across all flies within 
each group. Data comprise average values obtained in n = 12 (TNT controls), 
n= 11 (T4 controls), n = 11 (T5 controls), n = 13 (T4 block) and n = 12 (T5 
block) flies. Values of T4 and T5-block flies are highly significantly different from 
zero with P < 0.001. Significances indicated are based on two-sample t-test. 


Electrophysiology. Recordings were established under visual control using a Zeiss 
Microscope and a X40 water immersion objective. 

Behavioural analysis. The locomotion recorder was custom-designed according 
to ref. 18. It consisted of an air-suspended sphere floating in a bowl-shaped sphere 
holder. Motion of the sphere was recorded by two optical tracking sensors. 
Visual stimulation. For calcium imaging and electrophysiological experiments, 
we used a custom-built LED arena covering 180° and 90° of the visual field along 
the horizontal and the vertical axis, respectively, at 1.5° resolution. For the beha- 
vioural experiments, three 120-Hz LCD screens formed a U-shaped visual arena 
with the fly in the centre, covering 270° and 114° of the visual field along the 
horizontal and the vertical axes, respectively, at 0.1° resolution. 

Data evaluation. Data were evaluated off-line using custom-written software 
(Matlab and IDL). 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 


Flies. Flies were raised on standard cornmeal-agar medium at 25°C and 60% 
humidity throughout development on a 12h light/12h dark cycle. For calcium 
imaging, we used the genetically encoded single-wavelength indicator GCaMP5, 
variant G, with the following mutations: T302L, R303P and D380Y (ref. 2). 
Expression of GCaMP5 was directed by three different Gal4 lines, all from the 
Janelia Farm collection’. Flies used in calcium imaging experiments (Figs 1-3) 
had the following genotypes: T4/T5 line(w_ ; +/+; UAS-GCaMP5,R42F06-GAL4/ 
UAS-GCaMP5,R42F06-GAL4), T4 line (w’ ; +/+; UAS-GCaMP5,R54A03-GAL4/ 
UAS-GCaMP5,R54A03-GAL4), T5 line (w ; +/+; UAS-GCaMP5,R42H07- 
GAL4/UAS-GCaMP5,R42H07-GAL4). All driver lines were generated by the 
methods described in ref. 14 and were identified by screening a database of imaged 
lines, followed by reimaging of selected lines*’. As homozygous for both the Gal4- 
driver and the UAS-GCaMP5 genes, T4 flies also showed some residual expression 
in T5 cells, and TS flies also in T4 cells. This unspecific expression, however, was in 
general less than 25% of the expression in the specific cells. Flies used in electro- 
physiological and behavioural experiments (Fig. 4) had identical genotypes of the 
following kind: TNT control flies (w*/w*; UAS-TNT-E/UAS-TNT-E; +/+), T4 
control flies (w*/w ; + /+; VT37588-GAL4/+), T5 control flies (w*/w ; +/+; 
R42H07-GAL4/+), T4-block flies (w'/w_; UAS-TNT-E/+; VT37588-GAL4/+), 
T5-block flies (w'/w ; UAS-TNT-E/+; R42H07-GAL4/+). UAS-TNT-E flies 
were derived from the Bloomington Stock Center (stock no. 28837) and VT37588- 
Gal4 flies were derived from the VDRC (stock no. 205893). Before electrophysio- 
logical experiments, flies were anaesthetized on ice and waxed on a Plexiglas 
holder using bees wax. The dissection of the fly cuticle and exposure of the lobula 
plate were performed as described previously (for imaging experiments, see ref. 32; 
for electrophysiology, see ref. 21). Flies used in behavioural experiments were 
taken from 18 °C just before the experiment and immediately cold-anaesthetized. 
The head, the thorax and the wings were glued to a needle using near-ultraviolet 
bonding glue (Sinfony Opaque Dentin) and strong blue LED light (440 nm, dental 
curing-light, New Woodpecker). 

Two-photon microscopy. We used a custom-built two-photon laser scanning 
microscope**’ equipped with a X40 water immersion objective (0.80 NA, IR- 
Achroplan; Zeiss). Fluorescence was excited by a mode locked Ti:sapphire laser 
(<100 fs, 80 MHz, 700-1,020 nm; pumped by a 10 W CW laser; both Mai Tai; 
Spectraphysics) with a DeepSee accessory module attached for dispersion com- 
pensation control resulting in better pulse compression and fluorescence at the 
target sample. Laser power was adjusted to 10-20 mW at the sample, and an excita- 
tion wavelength of 910nm was used. The photomultiplier tube (H10770PB-40, 
Hamamatsu) was equipped with a dichroic band-pass mirror (520/35, Brightline). 
Images were acquired at a resolution of 256 X 256 pixels and a frame rate of 
1.87 Hz, except in Fig. 2 (7.5 Hz), using the ScanImage software”. 
Electrophysiology. Recordings were established under visual control using a X40 
water immersion objective (LumplanF, Olympus), a Zeiss microscope (Axiotech 
vario 100, Zeiss), and illumination (100 W fluorescence lamp, hot mirror, neutral 
density filter OD 0.3; all from Zeiss). To enhance tissue contrast, we used two 
polarization filters, one located as an excitation filter and the other as an emission 
filter, with slight deviation on their polarization plane. For eye protection, we 
additionally used a 420-nm LP filter on the light path. 

Behavioural analysis. The locomotion recorder was custom-designed according 
to ref. 18. Briefly, it consists of an air-suspended sphere floating in a bowl-shaped 
sphere holder. A high-power infrared LED (800 nm, JET series, 90 mW, Roithner 
Electronics) is located in the back to illuminate the fly and the sphere surface. Two 
optical tracking sensors are equipped with lens and aperture systems to focus on 
the sphere behind the fly. The tracking data are processed at 4 kHz internally, read 
out via a USB interface and processed by a computer at ~200 Hz. This allows real- 
time calculation of the instantaneous rotation axis of the sphere. A third camera 
(GRAS-20S4M-C, Point Grey Research) is located in the back which is essential for 
proper positioning of the fly and allows real-time observation and video recording 
of the fly during experiments. 

Visual stimulation. For calcium imaging and electrophysiological experiments, 
we used a custom-built LED arena that allowed refresh rates of up to 550 Hz and 16 
intensity levels. It covered 180° (1.5° resolution) and 90° (1.5° resolution) of the 
visual field along the horizontal and the vertical axis, respectively. The LED arena 
was engineered and modified based upon ref. 34. The LED array consists of 7 X 4 
individual TA08-81GWA dot-matrix displays (Kingbright), each harbouring 
8 X 8 individual green (568 nm) LEDs. Each dot-matrix display is controlled by 
an ATmegal68 microcontroller (Atmel) combined with a ULN2804 line driver 
(Toshiba America) acting as a current sink. All panels are in turn controlled via an 
I2C interface by an ATmegal28 (Atmel)-based main controller board, which 
reads in pattern information from a compact flash (CF) memory card. Matlab 
was used for programming and generation of the patterns as well as for sending 
the serial command sequences via RS-232 to the main controller board. The 
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luminance range of the stimuli was 0.5-33cdm *. For the calcium imaging 
experiments, two separate barriers were used to shield the photomultipliers from 
the stimulus light coming from the LED arena. The first was a spectral filter with 
transparency to wavelengths >540 nm placed directly over the LEDs (ASF SFG 10, 
Microchemicals). The second was a layer of black PVC extending from the fly 
holder over the arena. Square wave gratings had a spatial wavelength of 30° of 
visual angle and a contrast of 88%. Unless otherwise stated, they were moving at 
30°s |. Edges had the same contrast and were also moving at 30°s _'. For the 
experiments shown in Figs 1, 2b and 3, each grating or edge motion was shown 
twice within a single sweep, resulting in a total of eight stimulation periods. Each 
stimulus period lasted 4 s, and subsequent stimuli were preceded by a 3-s pause. In 
the experiment shown in Fig. 2a, a dark edge of 88% contrast was moved for 1s at 
15°s | from the front to the back at three different positions (22°, 44°, 66°, from 
frontal to lateral). At each position, edge motion was repeated 15 times. For the 
experiment shown in Fig, 2b, a bright edge of 88% contrast was moving at 15° s 
from the back to the front, and images were acquired at a frame rate of 7.5 Hz. For 
the experiments shown in Figs 3e, f, all six stimulus velocities were presented once 
within one sweep, with the stimulus lasting 4s, and different stimuli being sepa- 
rated by 2 s. In the experiments shown in Figs 3g, h, a single sweep contained all 12 
grating orientations with the same stimulus and pause length as above. For the 
electrophysiology experiments (Fig. 4a-f), multiple edges were used as stimuli 
moving simultaneously at 50°s '. To stimulate cells of horizontal system (HS 
cells), a vertical, stationary square-wave grating with 45° spatial wavelength was 
presented. For ON-edge motion, the right (preferred direction, PD) or the left edge 
(null direction, ND) of each light bar started moving until it merged with the 
neighbouring bar. For OFF-edge motion, the right or the left edge of each dark 
bar was moving. To stimulate cells of the vertical system (VS cells), the pattern 
was rotated by 90° clockwise. For the behavioural experiments (Fig. 4g-i), three 
120-Hz LCD screens (Samsung 2233 RZ) were vertically arranged to form a 
U-shaped visual arena (w= 31cm Xd=31cm X h= 47cm) with the fly in 
the centre. The luminance ranged from 0 to 131cdm * and covered large parts 
of the flies’ visual field (horizontal, +135°; vertical, +57°; resolution, <0.1°). 
The three LCD screens were controlled via NVIDIA 3D Vision Surround Tech- 
nology on Windows 7 64-bit allowing a synchronized update of the screens 
at 120 frames per second. Visual stimuli were created using Panda3D, an open- 
source gaming engine, and Python 2.7, which simultaneously controlled the 
frame rendering in Panda3D, read out the tracking data and temperature and 
streamed data to the hard disk. The balanced motion stimulus consisted of a 
square-wave grating with 45° spatial wavelength and a contrast of 63%. Upon 
stimulation onset, dark and bright edges moved into opposite directions at 10° s! 
for 2.25 s. This stimulation was performed for both possible edge directions and 
two initial grating positions shifted by half a wavelength, yielding a total of four 
stimulus conditions. 

Data evaluation. Data were evaluated off-line using custom-written software 
(Matlab and IDL). For the images shown in Figs le, f, 2a and 3a, b, the raw image 
series was converted into four images representing the relative fluorescence change 
during each direction of grating motion: (AF/F) sim = (Fetim — Fret)/Frer The image 
representing the stimulus fluorescence (F.tim) was obtained by averaging all images 
during stimulation; the image representing the reference fluorescence (Fre) 
was obtained by averaging three images before stimulation. Both images were 
smoothed using a Gaussian filter of 10 pixel half-width. For the images shown 
in Figs 1f and 3a, b, AF/F images were normalized by their maximum value. Then, 
a particular colour was assigned to each pixel according to the stimulus direction 
during which it reached maximum value, provided it passed a threshold of 25%. 
Otherwise, it was assigned to background. The response strength of each pixel was 
coded as the saturation of that particular colour. For the data shown in Figs 2b, c 
and 3c-h, the raw image series was first converted into a AF/F series by using the 
first three images as reference. Then, a region was defined within a raw image, and 
average AF/F values were determined within that region for each image, resulting 
in a AF/F signal over time. Responses were defined as the maximum AF/F value 
reached during each stimulus presentation minus the average AF/F value during 
the two images preceding the stimulus. For the bar graphs shown in Fig. 4c, f, the 
average voltage responses during edge motion (0.45 s) along the cell’s preferred 
(PD) and null direction (ND) were calculated. For each recorded tangential cell, 
the difference between the PD and the ND response was determined, and these 
values were averaged across all recorded cells. The data shown in Fig. 4g, h were 
obtained from the four stimulus conditions by averaging the turning responses for 
the two starting positions of the grating and calculating the mean difference 
between the turning responses for the two edge directions. For the bar graph 
shown in Fig. 4i, the average turning response of each fly during the last second 
of balanced motion stimulation was calculated. These values were averaged across 
all recorded flies within each genotype. 
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Generation of inner ear sensory epithelia from 
pluripotent stem cells in 3D culture 


Karl R. Koehler??, Andrew M. Mikosz*?, Andrei I. Molosh”*, Dharmeshkumar Patel”? & Eri Hashino!?? 


The inner ear contains sensory epithelia that detect head move- 
ments, gravity and sound. It is unclear how to develop these sens- 
ory epithelia from pluripotent stem cells, a process that will be 
critical for modelling inner ear disorders or developing cell-based 
therapies for profound hearing loss and balance disorders’. So far, 
attempts to derive inner ear mechanosensitive hair cells and sens- 
ory neurons have resulted in inefficient or incomplete phenotypic 
conversion of stem cells into inner-ear-like cells*’. A key insight 
lacking from these previous studies is the importance of the non- 
neural and preplacodal ectoderm, two critical precursors during 
inner ear development*'. Here we report the stepwise differenti- 
ation of inner ear sensory epithelia from mouse embryonic stem 
cells (ESCs) in three-dimensional culture’”’*. We show that by 
recapitulating in vivo development with precise temporal control 
of signalling pathways, ESC aggregates transform sequentially into 
non-neural, preplacodal and otic-placode-like epithelia. Notably, 
in a self-organized process that mimics normal development, vesi- 
cles containing prosensory cells emerge from the presumptive otic 
placodes and give rise to hair cells bearing stereocilia bundles and a 
kinocilium. Moreover, these stem-cell-derived hair cells exhibit 
functional properties of native mechanosensitive hair cells and 
form specialized synapses with sensory neurons that have also 
arisen from ESCs in the culture. Finally, we demonstrate how these 
vesicles are structurally and biochemically comparable to deve- 
loping vestibular end organs. Our data thus establish a new in vitro 
model of inner ear differentiation that can be used to gain deeper 
insight into inner ear development and disorder. 

During neurulation in vivo, the definitive ectoderm is subdivided into 
the neuroectoderm and non-neural ectoderm, the latter of which gives 
rise to the inner ear (Supplementary Fig. la). Recent studies have 
demonstrated how organogenesis of complex neuroectoderm tissues 
such as the cerebral cortex and retina can be faithfully reconstituted in 
vitro by culturing ESCs as a floating aggregate in serum-free media 
(serum-free floating culture of embryoid body-like aggregates with quick 
reaggregation; SFEBgq culture)'*'*”’. As the inner ear shares a common 
precursor with these tissues, the definitive ectoderm, we proposed that 
SFEBq culture could be redirected to generate inner ear epithelia using 
carefully timed morphogenetic cues (Fig. la and Supplementary Fig. 1b). 
Led by previous studies, we identified a definitive-ectoderm-like epithe- 
lium on day 3 of SFEBq culture, before the expression of neuroectoderm- 
associated proteins on day 5 (Supplementary Fig. 1c-j)'*"®. During early 
embryogenesis, activation of bone morphogenetic protein (BMP) sig- 
nalling is critical for induction of the non-neural ectoderm from the 
definitive ectoderm epithelium'"””. Consistent with this role, in aggre- 
gates treated with human BMP4 (hereafter BMP), the non-neural ecto- 
derm marker D/x3 was upregulated, whereas the neuroectoderm marker 
Sox1 was downregulated (Supplementary Fig. 1k, 1). However, BMP- 
treated aggregates also expressed the mesendoderm marker brachyury 
(also known as T), indicating the undesirable induction of mesoderm 
or endoderm cell types (Fig. 1b-h and Supplementary Fig. 2b)'*. To 


suppress aberrant mesendoderm induction, we combined BMP treat- 
ment with the transforming growth factor-f (TGF-B) inhibitor SB- 
431542 (SB; Fig. 1a)'°. A combined treatment of SB and BMP (BMP/ 
SB) on day 3 completely abolished brachyury “ cells in the outer epithe- 
lium (Fig. 1d, e, h; see also Supplementary Discussion)!*"*””. 

To test whether BMP/SB treatment indeed induced non-neural ecto- 
derm, we assessed the cellular composition of BMP/SB-treated aggregates 
by immunofluorescence at differentiation day 5. Notably, expression of 
the non-neural ectoderm marker activator protein 2 (AP2, also known as 
Tfap2a) was found predominantly in the E-cadherin (Ecad, also known 
as Cdh1)* outer epithelium, but was absent in other regions of treated 
aggregates (Fig. li, j). Moreover, we identified an intermediate layer of 
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Figure 1 | Non-neural and preplacodal ectoderm induction in three- 
dimensional culture. a, Non-neural ectoderm induction strategy. d, day; 
de, definitive ectoderm; me, mesendoderm; nne, non-neural ectoderm. 

b-d, Morphology of control (Ctrl), BMP and BMP/SB aggregates. DIC, 
differential interference contrast. e, SB decreases the level of brachyury 
expression induced by BMP (n = 3; **P< 0.01; mean + s.e.m.). 

f-h, Brachyury” cells are less prevalent in BMP/SB aggregates. DAPI, 
4',6-diamidino-2-phenylindole. i-k, BMP/SB aggregates contain an outer 
AP2* Ecad* epithelium and an interior Sox1* Ncad* cell layer. 1, BMP/SB 
aggregate composition on day 5. ne, neuroectoderm. m, Preplacodal ectoderm 
induction strategy. epi, epidermis; ppe, preplacodal ectoderm. n-p, BMP/SB- 
FGE/LDN (B/S-F/L) aggregates are distinguished by a thickened AP2* 
epithelium absent in other conditions. Scale bars, 100 um. 
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each aggregate with Soxl* and N-cadherin (Ncad, also known as 
Cdh2)* cells, indicative of the formation of neuroectoderm (Fig. 1j, 
k and Supplementary Fig. 1h). In addition, the pluripotency marker 
Nanog was confined to cells at the core of each aggregate (Sup- 
plementary Fig. 2g). Altogether, these data strongly suggest that the 
outer epithelium of day 5 BMP/SB-treated aggregates represents non- 
neural ectoderm, which surrounds an interior layer containing a mix- 
ture of mesendodermal and neuroectodermal tissues and a central core 
of pluripotent cells (Fig. 11 and Supplementary Fig. 2). In support of 
this conclusion, the outer epithelium of BMP/SB samples develops into 
a Krt5* p63 (also known as Trp63)~ epithelium, mimicking embry- 
onic development of the epidermis (Supplementary Fig. 3). 

The preplacodal region, a contiguous band of embryonic head ecto- 
derm, arises from the non-neural ectoderm at the neural tube border and 
is the precursor to all of the cranial placodes (Supplementary Fig. 4a)"*. 
Although BMP signalling is required for induction of non-neural ecto- 
derm, recent studies suggest that subsequent BMP inhibition, along 
with active fibroblast growth factor (FGF) signalling, is necessary 
for non-neural cells to select a preplacodal over an epidermal fate 
(Fig. 1m)’?°?!. With this in mind, we began treating BMP/SB aggre- 
gates with various combinations of the specific BMP inhibitor LDN- 
193189 (LDN) and human FGF2. We found that BMP/SB aggregates 
treated with LDN on day 4.5 maintained expression of D/x3, indicating 
that BMP inhibition after non-neural induction does not reverse cell 
fate specification (Supplementary Fig. 4c). Consistent with the prepla- 
codal ectoderm being thickened relative to the surrounding surface 
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Figure 2 | Otic induction from the preplacodal epithelium in vitro. a, OEPD 
induction in mice. nc, neural crest; se, surface ectoderm. b, c, Pax8 (b) Pax2 

(c) messenger RNA expression on day 8 (n = 3-4; *P < 0.05, **P< 0.01; 

mean ~ s.e.m.). d-g, Pax8 and Ecad expression in BMP/SB aggregates (d, f) and 
BMP/SB-FGE/LDN aggregates (e, g) on days 6 and 8. Arrowheads indicate vesicles. 
h, Day 12 BMP/SB-FGF/LDN aggregate with Pax2* Ecad* vesicles (arrowheads). 
i-k, Pax2' Ecad* (i), Pax2 Pax8* (j) and Sox2* Ecad” (k) vesicles invaginate 
from the inner epithelium from days 9-12. 1], XAV939 decreases the number of 
vesicles expressing Pax2 and Ecad, Pax2 and Pax8, and Pax8 and Sox2 on day 12. 
(n = 9 aggregates; **P < 0.01, ***P < 0.001; mean = s.e.m.). m, Self-guided, 
inside-out rearrangement of BMP/SB-FGF/LDN aggregates and formation of otic 
vesicles. Scale bars, 100 jum (d-f, h), 50 jum (i-k) and 25 jum (g). 
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ectoderm*”’, we found thickened patches of epithelia in BMP/SB/LDN 
aggregates that were not present in BMP/SB epithelia (Sup- 
plementary Fig. 4b, h). As observed in vivo, this morphological change 
appeared to be dependent on endogenous FGFs, as inhibition of FGF 
signalling by the small molecule SU5402 abolished epithelial thick- 
ening (Supplementary Fig. 5a). A combined treatment of recombinant 
FGF2 and LDN (hereafter BMP/SB-FGF/LDN) significantly increased 
the thickness of the epithelium compared to BMP/SB and BMP/SB- 
LDN aggregates (Fig. lIn—p and Supplementary Fig. 4h). Notably, in >95% 
of BMP/SB-FGF/LDN aggregates, a thickened Gata3” Sixl1* AP2* 
epithelium ruffled and formed ovoid vesicles between days 6 and 8 
(Fig. lp and Supplementary Fig. 4d-j). These and the following data 
strongly suggest that the outer epithelium of BMP/SB-FGF/LDN- 
treated aggregates is representative of preplacodal ectoderm. 

In vertebrates, the otic placode is derived from a posterior prepla- 
codal region known as the otic-epibranchial placode domain (OEPD; 
Fig. 2a). The otic placode is demarcated from other developing pla- 
codes by expression of the transcription factors Pax2 and Pax8 (see 
Supplementary Fig. 6 for the in vivo situation)”. Because the induction 
of the OEPD requires FGF signalling and the otic placode epithelium 
thickens, invaginates and forms the otic vesicle’®, we examined 
whether the vesicle-forming epithelia of BMP/SB-FGF/LDN aggre- 
gates were representative of the primordial inner ear. Our quantitative 
PCR analysis revealed that Pax2 and Pax8 were significantly upregu- 
lated in BMP/SB-FGF/LDN samples compared to other conditions 
(Fig. 2b, c). By day 6, we observed Pax8™ cells distributed in placode- 
like patches throughout the outer Ecad* epithelium of BMP/SB-FGE/ 
LDN aggregates only (Fig. 2d, e). Notably, we also observed a popu- 
lation of Pax8* Ecad™ cells in the interior of each aggregate, suggest- 
ing formation of mid-hindbrain tissue in this region (Supplementary 
Fig. 7). The percentage of Pax8* Ecad* epithelium markedly increased 
between days 6 and 8 (Fig. 2f, g and Supplementary Fig. 8a—e) and the 
Pax8* Ecad* epithelium bore a striking morphological resemblance 
to the developing otic placode (Supplementary Fig. 6). Of note, we did 
not observe expression of Pax3 or Pax6 in the outer epithelium, ruling out 
the development of other cranial placodes (Supplementary Fig. 7c-g). 
Taken together, these findings show that FGF/LDN treatment is critic- 
ally important for in vitro otic placode induction and that treatment is 
most effective when performed between days 4 and 5 (Supplementary 
Fig. 8). 

In vivo, the prosensory domain of the otic placode/vesicle gives rise 
to the vestibular/cochlear sensory epithelia and inner ear sensory neu- 
rons. Otic prosensory cells are defined by the expression of Pax2, Pax8, 
Ecad, Sox2 and jagged 1 (Jag1) (Supplementary Figs 6 and 12). On day 
8 of differentiation, BMP/SB-FGF/LDN aggregates were transferred to 
a serum-free floating culture to allow self-guided differentiation. In 
each aggregate analysed, approximately 24 h after transfer, the interior 
cell mass breached the outer epithelium and formed a heterogeneous 
cell layer on the exterior of the aggregate (n = 253 aggregates; Fig. 2h, 
m and Supplementary Fig. 9). This indicated that the outer epithelium 
transitions to an inner epithelium lining the core of each aggregate. 
During days 9-12 we observed the continuous evagination of vesicles 
containing Pax2* Ecad*, Pax2* Pax8*, and Sox2* Pax8* cells from 
the presumptive OEPD epithelium into the exterior cell layer (Fig. 2h-k 
and Supplementary Figs 9 and 10), which resulted in ~25 Pax2* 
Pax8" Sox2” vesicles per aggregate (Fig. 21). We proposed that endo- 
genous Wnt signalling may underlie induction of vesicles bearing otic 
prosensory markers in our culture because Wnt signalling is necessary 
for otic placode formation in vivo’®. Confirming this hypothesis, treat- 
ment of aggregates with the Wnt inhibitor XAV939 from days 8-10 
significantly decreased the number of prosensory vesicles and, speci- 
fically, reduced the prevalence of Pax2" vesicles (Fig. 21). These data 
indicate that endogenous Wnt signalling induces the formation of otic 
vesicles from the presumptive otic placode using similar mechanisms as 
observed in vivo. Interestingly, the remaining inner epithelium deve- 
loped into Krt5* p63* epidermis and the exterior layer of cells gave rise 
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to mesenchyme-tissue-like cartilage and adipose (Supplementary Fig. 
11). The basal (p63) layer of the inner epithelium was oriented so that 
the apical surface of the epithelium was facing the interior of the ag- 
gregate. Thus, the process of vesicle evagination towards the outside of 
the aggregate is consistent with the orientation of embryonic otic vesicle 
invagination into the head mesenchyme (Supplementary Fig. 11). 
During development, the prosensory domain of the otic vesicle 
is destined to become sensory epithelia harbouring Myo7a* sensory 
hair cells. Surprisingly, the epithelia of Sox2* Jag] * vesicles became 
Myo7a" by day 14, mimicking the diffuse Myo7a staining pattern in 
the embryonic day (E)9.5 otic vesicle (Fig. 3a, b and Supplementary 
Fig. 12a-f). By day 16, we found that each aggregate contained 
15.4 + 4.8 (n = 12 aggregates) vesicles lined with Myo7a* Sox2* cells 
bearing the stereotyped morphology of sensory hair cells with a large 
nuclei (~8-j1m diameter) positioned basal to an elongated apical end 
(Fig. 3c). The Myo7a* Sox2* cells were organized in a radial pattern 
with the apical end abutting a lumen of varying sizes (~5-1,100-j1m- 
long-axis diameter; Fig. 3c—e). Basal to each layer of Myo7a* Sox2* 
cells was a tightly arranged layer of Sox2* cells reminiscent of support- 
ing cells (Fig. 3c-j and Supplementary Video 1). Mimicking the in vivo 
sensory epithelia, hair cells and supporting cells could be further dis- 
tinguished by expression of Brn3c (also known as Pou4f3) and cyclin 
D1, respectively (Supplementary Fig. 13a-f)’*. F-actin staining 
revealed tight cell-cell junctions along the luminal surface as well as 
F-actin® espin (Espn)* stereocilia bundles (Fig. 3k-0, Supplementary 
Fig. 13g-i and Supplementary Video 2). Every Myo7a" cell analysed 
also had an acetylated-c-tubulin® kinocilium protruding from the 
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apical end into the lumen (Fig. 3m, n and Supplementary Fig. 13j-m). 
Stereocilia and kinocilium were not visible at day 16, but the average 
height increased from day 20 to day 24 and fell within the range of 
heights recorded from an adult mouse utricle (Fig. 30)**. The hair cells 
also appeared to be functional on the basis of rapid uptake of FM1-43 dye 
and the diversity of voltage-dependent currents (Fig. 3p-r and 
Supplementary Fig. 14)°°?*. In all cells included in this study we observed 
outwardly rectifying potassium currents with voltage-dependent activa- 
tion kinetics to amplitudes ranging from 194 pA to 3,612 pA with a mean 
of 1,003 + 527 pA (n = 6; Fig. 3r). In addition, some cells were distin- 
guished by the presence of a transient inward current, probably reflect- 
ing calcium channel activity (Supplementary Fig. 14k, I). By day 20 each 
BMP/SB-FGF/LDN aggregate contained 1,552.3 + 83.1 Myo7a’ cells 
with typical hair cell morphology, in marked contrast to other conditions 
that yielded no Myo7a™ cells (~1-2% of all cells in the aggregate; 
n = 12-16 aggregates per condition; Fig. 3s, t). We conclude from these 
data that the cytoarchitecture, cellular morphology and functional char- 
acteristics observed in Myo7a* Sox2* vesicles are identical to sensory 
epithelia in the inner ear. 

There are four distinct populations of hair cells in the mammalian 
inner ear; type I and type II vestibular and inner and outer cochlear 
hair cells. We wished to reveal which type of hair cells populated the 
stem-cell-derived sensory epithelia in our culture. Previous studies 
have suggested that expression of Pax2 and Sox2 may distinguish 
vestibular from cochlear hair cells*””*. In addition, expression of the 
calcium-binding protein calbindin 2 (Calb2) and Sox2 uniquely label 
type II vestibular hair cells, whereas calyceal innervation from sensory 
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Figure 3 | Stem-cell-derived otic vesicles generate functional inner ear hair 
cells. a, b, Expression of Myo7a in the E9.5 otic vesicle (OtV) (a) and day 14 
vesicles (b). nb, neuroblasts. c-e, Myo7a* Sox2* hair cells (hc) with underlying 
Sox2* supporting cells (sc) on days 15 (c) and 16 (d, e). f-i, Whole-mount 
immunofluorescence for Myo7a and Sox2 (f) and three-dimensional 
reconstruction (g-i) of a vesicle in a day 20 BMP/SB-FGEF/LDN aggregate. 

j, Vesicles display the hallmarks of inner ear sensory epithelia. k-m, F-actin 
labels cell-cell junctions on the luminal surface and stereocilia bundles. 

m, Acetylated «-tubulin (tubulin) labels kinocilium and the cuticular plate. 

n, Transmission electron micrograph (TEM) of stereocilia bundles and 
kinocilium (arrow). 0, Distribution of stereocilia and kinocilium heights on 
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FM1-43FX incubation, fixation and staining for F-actin. q, Representative 
epithelium preparation (inset) and hair cell during electrophysiological 
recordings. r, Representative voltage-current responses recorded from hair 
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Myo7a* Sox2* vesicles. Epidermis is indicated by dashed outline. t, Number of 
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neurons identifies type I vestibular hair cells (Fig. 4a)*”°"°. On day 20, 
nearly all stem-cell-derived hair cells were Sox2” Pax2* (n > 250 hair 
cells; Figs 3c-e, i, 4b). Moreover, every hair cell expressed Calb2, sug- 
gesting a uniform population of type II vestibular hair cells (Fig. 4c, e). 
From a structural standpoint, we noted the presence of larger lumen 
vesicles (3.7 + 0.3 per aggregate, defined as >50-m-long-axis dia- 
meter, n = 15 aggregates) with regions of sensory (with hair cells) 
and nonsensory (without hair cells) epithelia identical in organization 
to a vestibular end organ (Fig. 4d-g and Supplementary Figs 14b, h and 
15). Intriguingly, we also observed discrete populations of Calb2~ and 
Brn3a* Tujl (also known as Tubb3)* neurofilament (Nefl)* neurons 
extended processes towards the sensory epithelia (Fig. 4h and Sup- 
plementary Fig. 16a—d). We were surprised to find that, by day 16, hair 
cells exhibited punctate expression of ribeye (Ctbp2) colocalized with 
the postsynaptic and neuronal markers Tujl, synaptophysin (Syp), 
Snap25 and Rab3a, indicating the formation of ribbon synapses with 
adjacent neurons (Fig. 4h-j and Supplementary Fig. 16e-j). Notably, 
the number of ribbon synapses increased over time in culture, suggest- 
ing a maturation process similar to native hair cells in the inner ear 
(Fig. 4k and Supplementary Fig. 16h-j)°°. Together, these results indi- 
cate that stem-cell-derived vesicles in our culture represent immature 
vestibular end organs, specifically the utricle and/or saccule’*”*”*. 

In conclusion, our findings highlight a binary mechanism of BMP 
activation and TGF-f inhibition underlying in vitro non-neural ecto- 
derm induction. Furthermore, subsequent inhibition of BMP signal- 
ling concomitant with activation of FGF signalling are required for 
preplacodal induction. Notably, the formation of these precursors is 
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Figure 4 | Stem-cell-derived sensory epithelia are comparable to immature 
vestibular end organs. a, Schematic of vestibular end organs and type I/II 
vestibular hair cells. vgn, vestibular ganglion neurons. b, ¢, Pax2 (b) and Calb2 
(c) are expressed in all Myo7a* stem-cell-derived hair cells on day 20. CyclinD1 
(cD1) is expressed in supporting cells. d-g, The structural organization of vesicles 
with Calb2* Myo7a™ hair cells mimics the E18 mouse saccule (sagittal view) in 
vivo. nse, nonsensory epithelium. h, Tujl* neurons extending processes to hair 
cells. i, The synaptic protein Snap25 is localized to the basal end of hair cells. j, The 
postsynaptic marker Syp colocalizes with Ctbp2 (arrowheads and inset). hen, hair 
cell nucleus. k, Quantification of synapses on day 16, 20 and 24 hair cells (n > 100 
cells, *P < 0.05, ***P < 0.001; mean + s.d.). 1, Overview of in vitro differentiation. 
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sufficient to trigger self-guided induction of the sensory epithelia, from 
which hair cells with structural and functional properties of native 
mechanosensitive hair cells in the inner ear spontaneously arise in 
significant numbers (~1,500 hair cells per aggregate; see Fig. 41). 
This new approach can not only be used as a potent model system 
to elucidate the mechanisms underlying inner ear development, but 
will also provide an easily accessible and reproducible means of gen- 
erating hair cells for in vitro disease modelling, drug discovery or 
cellular therapy experiments. 


METHODS SUMMARY 


The SFEBg culture system was performed as described previously, but with major 
modifications’. On day 3 of the protocol, BMP4 (10ngml') and SB-431542 
(1 uM) were added to each well at 5X concentration in 25 pl of fresh media. On 
day 4.5, FGF2 (25 ng ml!) and LDN-193189 (100 nM) were added to each well at 
6X concentration in 25 ul of fresh media. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 


ESC culture. ESCs derived from blastocyst-stage embryos of R1 mice were main- 
tained in feeder-free conditions using 2i-LIF medium as previously described*’. In 
brief, ESCs were maintained on gelatin and used for experimentation until passage 40. 
N2B27 medium consisted of a 1:1 mixture of Advanced DMEM/F12 and Neurobasal 
medium (Invitrogen) supplemented with 1 mM GlutaMAX (Invitrogen) and 1 mM 
penicillin/streptomycin (STEMCELL Technologies). 2i-LIF medium was made by 
supplementing N2B27 medium with 1,000 U ml * leukaemia inhibitory factor (LIF; 
Millipore), 3 1M CHIR99021 (Stemgent) and 1 uM PD0325901 (Santa Cruz). 

Days 0-3 of SFEBq differentiation were performed as described with slight 

modifications*’. In brief, ESCs were dissociated with 0.25% trypsin-EDTA, re- 
suspended in differentiation medium and plated 100 ul per well (3,000 cells) on 
96-well low-cell-adhesion U-bottom plates (Lipidure Coat, NOF). Differentiation 
medium was G-MEM supplemented with 1.5% knockout serum replacement 
(Invitrogen), 0.1mM nonessential amino acids, 1mM sodium pyruvate, 1mM 
penicillin/streptomycin and 1 mM 2-mercaptoethanol. On day 1, half of the medium 
in each well was exchanged for fresh differentiation medium containing Matrigel 
(2% (v/v) final concentration). On day 3 of the protocol, BMP4 (10 ng ml~ ') and SB- 
431542 (1 uM) were added to each well at 5X concentration in 25 ul of fresh media. 
On days 4-5, FGF2 (25 ngml ') and LDN-193189 (100 nM) were added to each 
well at 6X concentration in 25 1 of fresh media. The concentration of Matrigel was 
maintained at 2% (v/v) throughout days 1-8. On day 8 of differentiation, cell aggre- 
gates were transferred to 24-well plates (Lipidure Coat, NOF; 4-8 aggregates per 
well) in N2 medium containing 1% (v/v) Matrigel. N2 medium contained Advanced 
DMEM/F12, 1X N2 Supplement, 1 mM penicillin/streptomycin or 50p1gml~' 
Normocin (InvivoGen) and 1 mM GlutaMAX. For some experiments small mole- 
cules were added to N2 medium before plating the aggregates. Half of the medium 
was changed every day during long-term floating culture for up to 30 days. 
Signalling molecules and recombinant proteins. The following small molecules 
and recombinant proteins were used: recombinant human BMP4 (10 ng ml}; 
Stemgent), human FGF2 (25ngml- I. PeproTech), XAV939 (14M; Santa 
Cruz), SU5402 (101M; BioVision), SB-431542 (1M; Tocris Bioscience) and 
LDN-193189 (100 nM; Stemgent). Notably, we have obtained comparable results 
using concentrations of up to 1 14M LDN-193189. 
Quantitative PCR. As described previously*’, RNA was isolated using the RNeasy 
Minikit (Qiagen) and treated with TURBO DNase (Ambion). Single-stranded 
complementary DNA was synthesized using Omniscript reverse transcriptase 
(Qiagen) and Oligo-dT primers. All amplicons had standardized sizes of 100- 
110 base pairs. cDNA samples were amplified on an ABI PRISM 7900HT 
Sequence Detection System (Applied Biosystems) using the SYBR Green PCR 
Master Mix (Applied Biosystems). For each PCR reaction, a mixture containing 
cDNA template (5ng), Master Mix, and forward and reverse primers (400 nM 
each) was treated with uracil N-glycosylase at 50 °C for 2 min before undergoing 
the following program: 1 cycle, 95 °C, 10 min; 45 cycles, 95 °C, 15s, 60 °C, 1 min; 1 
cycle, 95°C, 15s, 60 °C, 15s, 95°C, 15s; 72°C, hold. Melting curve analysis was 
performed to confirm the authenticity of the PCR product. The mRNA level for 
each gene was calculated relative to L27 mRNA expression. 

Primers used: DIx3 forward: CAGTACGGAGCGTACCGGGA, reverse: TGC 

CGTTCACCATGCGAACC; Sox1 forward: AACCAGGATCGGGTCAAG, reverse: 
ATCTCCGAGTTGTGCATCTT; brachyury forward: CACACGGCTGTGAGAG 
GTACCC, reverse: TGTCCGCATAGGTTGGAGAGCTC; Pax8 forward: CGGCG 
ATGCCTCACAACTGG, reverse: TGGGCCAAGTCCACAATGCG; Pax2 forward: 
CCCGTTGTGACCGGTCGTGATAT, reverse: TGGGTTGCCTGAGAACTCG 
CTC. 
Immunohistochemistry. Aggregates were fixed with 4% paraformaldehyde. The 
fixed specimens were cryoprotected with a graded treatment of 10, 20 and 30% 
sucrose and then embedded in tissue freezing medium. Frozen tissue blocks were 
sectioned into 10- or 12-1m cryosections. For immunostaining, a 3% goat or horse 
serum and 0.1% Triton X-100 solution was used for primary antibody incubation. 
An Alexa Fluor 488-conjugated anti-mouse IgG or anti-rat IgG and an Alexa Fluor 
568-conjugated anti-rabbit IgG (Invitrogen) were used as secondary antibodies. A 
DAPI counterstain was used to visualize cellular nuclei (Vector, VectaShield). For 
whole-mount staining, aggregates were placed directly into blocking solution with 
1% Triton X-100 following fixation. For confocal imaging and three-dimensional 
reconstruction experiments, following secondary antibody incubation, aggregates 
were cleared using ScaleA2 solution for 1-2 days followed by ScaleB4 treatment for 
another 2 days as described previously**. Microscopy was performed on a Nikon 
TE2000 inverted microscope or an Olympus FV 1000-MPE confocal/multiphoton 
microscope. Three-dimensional reconstruction was performed using Voxx (cus- 
tom software developed by Indiana Center for Biological Microscopy). 

The following antibodies were used: anti-E-cadherin (rabbit, Abcam; mouse, 
BD Biosciences), anti-N-cadherin (mouse, BD Bioscience), anti-Sox1 (rabbit, Cell 
Signaling Technologies), anti-Nanog (rabbit, Abcam), anti-brachyury (goat, Santa 


Cruz Biotechnology), anti- AP2« (mouse, DHSB), anti-Pax8 (rabbit, Abcam), anti- 
Pax2 (rabbit, Invitrogen; mouse, Abnova), anti-Sox2 (mouse, BD Biosciences), 
anti-Jag] (rabbit, LSBio), anti-p27?! (mouse, BD Biosciences), anti-myosinVIla 
(rabbit, Proteus), anti-acetylated-c-tubulin (mouse, Abcam), anti-Tujl (mouse, 
Covance), anti-Calb2 (mouse, Millipore), anti-Caspr1 (mouse, NeuroMAB), anti- 
Caspr2 (mouse, NeuroMAB), anti-p63 (mouse, Santa Cruz Biotechnology), anti- 
cytokeratin 5 (rabbit, Sigma), anti-Nefl (rabbit, Millipore), anti-Brn3a (mouse, 
Millipore), anti-islet] (mouse, DSHB), anti-Syp (rabbit, Invitrogen), anti-Brn3c 
(mouse, Santa Cruz Biotechnology), anti-Ctbp1 and anti-Ctbp2 (mouse, BD 
Biosciences), anti-Rab3 (mouse, BD Biosciences), anti-Snap25 (mouse, BD 
Biosciences), anti-Pax6 (rabbit, Abcam), anti-Pax3 (mouse, DSHB), anti-aPKC 
(rabbit, Santa Cruz Biotechnology), anti-laminin-B1 (rat, Abcam). For most of the 
antibodies, mouse embryonic tissue sections were used as positive controls. Mouse 
embryos were dissected from time pregnant CD-1 mice using a protocol approved 
by the Institutional Animal Care and Use Committee at Indiana University School 
of Medicine. The embryo fixation and processing procedure was identical to that 
used for cell aggregates. 

The Alcian blue staining procedure was modified from a previously reported 

method”. In brief, cryosections were incubated in Alcian blue staining solution for 
10 min and subsequently de-stained using 60% ethanol/40% acetic acid for 20 min. 
A final eosin stain was performed for 30s. For Oil Red O staining, cryosections 
were kept in 60% isopropanol for 2 min and then placed in freshly prepared Oil 
Red O stain for 5 min followed by a 30-s haematoxylin stain. 
Image analysis. The percentage of epithelial cells expressing Pax8 and Ecad was 
established by analysing serial sections of day 6 and day 8 aggregates. Data are 
representative of 6-8 aggregates from at least 3 separate experiments. For analysis 
of each aggregate, 5 cryosections were chosen at random positions along the z-axis 
of the aggregate. Using Nikon Elements or NIH Image] software, the Ecad* outer 
epithelium was outlined and cell counting of DAPI and Pax8" nuclei along the 
length of the epithelium established a percentage for each cryosection. 

The apparent thickness of epithelia was determined by analysing cryosections 
stained with Ncad (control) or Ecad (all other conditions) on days 3-6. Data are 
representative of 6-8 aggregates from at least 3 separate experiments. For each 
aggregate, 3 serial sections were analysed. Five points along the epithelium were 
randomly chosen and the thickness was measured using Nikon Elements image 
analysis tools. 

Similarly, the number of Myo7a* hair cells in each day 20 aggregate was 
determined by analysing 10-1m serial cryosections. Each biological sample repre- 
sents the average number of hair cells counted in 4-6 cell aggregates and data are 
representative of the average from 3 separate experiments (15 aggregates total for 
each condition). Odd and even numbered cryosections were analysed separately 
and averaged to avoid double counting. The number of vesicles was quantified 
similarly, but every third section was analysed to avoid double counting and allow 
for analysis of 3 separate staining combinations. Vesicles with a long axis diameter 
larger than 30 um were accounted for to avoid double counting. 

Stereocilia heights were determined by measuring the apparent length of 
F-actin-labelled structures protruding from Myo7a~ hair cells on day 20 and 
24. Likewise, kinocilium heights were determined by measuring the apparent 
length of acetylated-o-tubulin-labelled protrusions from Myo7a" hair cells. 
Regions of interest were chosen randomly for analysis and >100 cells were ana- 
lysed across 3-5 separate epithelia for the data shown in Fig. 3. 

Synapses were quantified by analysing day 16, 20 and 24 aggregate sections 
stained for Syp and Ctbp2 using a previously described method**. Regions of 
interest were chosen randomly for analysis and >100 cells were analysed across 
4-5 separate epithelia from 3 separate experiments for the data shown in Fig. 3. 
Confocal z-stacks were taken of Ctbp2-stained hair cells. The maximum-intensity 
projections were used to count the number of Ctbp2* puncta surrounding each 
hair cell nuclei. 

Statistical analysis. Statistical significance was determined using a Student’s t-test 
for comparison of two groups or a one-way analysis of variance followed by 
Tukey’s post-hoc test for multiple comparisons, unless stated otherwise. All data 
were analysed using Prism 6 or Microsoft Excel software. 

FM1-43 labelling. The presence of functional mechanosensitive channels was 
confirmed using a FM1-43 dye uptake assay similar to previous studies**””*. 
Large lumen aggregates (that is, >500-11m-long-axis diameter), identified by their 
translucency and spherical morphology relative to surrounding tissue, were used 
for these experiments. Aggregates were incubated in DMEM-F12 containing 
FM1-43EFX (5 1M; Invitrogen) for 1 min and then washed 3X in fresh N2 medium. 
A faint cellular outline caused by autofluorescence was used to identify potential 
hair cells in the vesicle wall. In N2 medium, a 0.25-|1m tungsten needle was used to 
puncture each vesicle in an area away from the site of potential hair cells. The 
punctured vesicles were incubated in DMEM-F12 containing FM1-43FX (5 uM) 
for 1 min with gentle rocking and then washed 3 in fresh N2 medium. Vesicles 
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were imaged to confirm dye uptake and immediately fixed with 4% paraformalde- 
hyde. For some experiments, epithelia were fixed and incubated in PBS containing 
1% Triton X-100 and phalloidin conjugated to Alexa Fluor 647 (Invitrogen) to 
confirm the identity of hair cells. 

Electrophysiological recordings. On day 24 of differentiation, large lumen vesi- 
cles (>500-m diameter) were dissected from cell aggregates following a 30-min 
incubation in DMEM/F12 containing dispase (STEMCELL Technologies). 
Epithelial regions containing hair cells were identified on the basis of a thickened 
morphology relative to the rest of the vesicle epithelium. Two incisions were made 
using tungsten needles on the opposite side of the vesicle in order to expose and 
flatten the hair-cell-containing epithelium. The flattened epithelium was mounted 
onto round glass coverslip and held in position by two wires glued to the coverslip 
using MDX4-4210 (Corning). The coverslip was then placed in a submersion-type 
slice chamber mounted on the stage of a Nikon E600FN Eclipse microscope. 
Electrophysiological recordings were performed under continuous perfusion of 
oxygenated artificial cerebrospinal fluid that contained the following (in mM): 
130 NaCl, 3.5 KCI, 1.1 KH2PO,, 1.3 MgCh, 2.5 CaCl, 30 NaHCOs, 10 glucose, 
pH7.4 (320 mOsmkg_ '). Recording pipettes were pulled from borosilicate capil- 
lary glass (WPI) with resistances ranging from 2 to 3 MQ. Recording pipettes were 
filled with a potassium gluconate-based recording solution that contained the 
following (in mM): 130 K-gluconate, 3 KCl, 3MgCl:, 5 phosphocreatine, 
2K-ATP, 0.2 NaGTP, 10 HEPES, pH7.3 (290 mOsm kg"). Whole-cell access 
resistances were monitored throughout each experiment and ranged from 
5-20 MQ; a change of 15% was deemed acceptable. 

Hair cells were identified with a 40 x water-immersion objective and differential 
interference contrast. Only cells with hair bundles on their apical surface were 
chosen for recording. Positive pressure was maintained as the recording pipette 
was lowered into the epithelium. When the recording pipette touched the mem- 
brane, positive pressure was released and tight seal was formed. Recordings were 
obtained at 30°C using solution inline heater (Warner Instruments). The cells 
were held at —60 mV, and data were acquired using whole-cell technique in 
voltage-clamp mode using a Multiclamp 700B amplifier (Molecular Devices) 
coupled to a Digidata 1332A board (Molecular Devices). The data were analysed 
using the pClamp 10.2 (Molecular Devices). All chemicals were purchased from 
Sigma-Aldrich. 

Transmission electron microscopy. Day 24 aggregates were fixed in 2% para- 
formaldehyde/2% glutaraldehyde in 0.1M phosphate buffer. After fixation the 
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specimens were rinsed with phosphate buffered saline followed by post-fixation 
with 1% osmium tetroxide. Thereafter, the aggregates were dehydrated through a 
series of graded ethyl alcohols and embedded in Embed 812 (Electron Microscopy 
Sciences). Ultra-thin sections (70-80 nm) were cut, stained with uranyl acetate and 
viewed on a Tecnai BioTwin (FEI) transmission electron microscope at 80kV. 
Digital images were taken with an Advanced Microscope Techniques couple- 
charged device camera. 

Western blot analysis. Cell aggregates were lysed in radioimmunoprecipitation 
assay buffer supplemented with a protease inhibitor cocktail (Roche). Cell extracts 
were centrifuged at 13,000 r.p.m., 4 °C for 10 min to remove insoluble debris and 
chromosomal DNA. Proteins were separated by denaturing SDS-PAGE and 
transferred to PVDF membranes (Biorad). After blocking, membranes were incu- 
bated with a primary antibody overnight at 4 °C. An anti-f-actin (Sigma) antibody 
was used for confirmation of equal loading of the samples. Blots were detected 
with an horseradish peroxidase-conjugated goat anti-rabbit or rabbit anti-mouse 
antibody (Invitrogen) and visualized with the SuperSignal West Pico or -Femto 
chemiluminescent detection system (Pierce) and exposed to X-ray film. 


31. Ying,Q~-L. etal. The ground state of embryonic stem cell self-renewal. Nature 453, 
519-523 (2008). 

32. Ejiraku, M. & Sasai, Y. Mouse embryonic stem cell culture for generation 
of three-dimensional retinal and cortical tissues. Nature Protocols 7, 69-79 
(2012). 

33. Koehler, K. R. et a/. Extended passaging increases the efficiency of neural 
differentiation from induced pluripotent stem cells. BMC Neurosci. 12, 82 
(2011). 

34. Hama, H. et al. Scale: a chemical approach for fluorescence imaging and 
reconstruction of transparent mouse brain. Nature Neurosci. 14, 1481-1488 
(2011). 

35. Jegalian, B. G. & De Robertis, E. M. Homeotic transformations in the mouse 
induced by overexpression of a human Hox3.3 transgene. Cel/ 71, 901-910 
(1992). 

36. Coate, T. M. et a/. Otic mesenchyme cells regulate spiral ganglion axon 
fasciculation through a Pou3f4/EphA4 signaling pathway. Neuron 73, 49-63 
(2012). 

37. Gale, J.E., Marcotti, W., Kennedy, H. J., Kros, C. J. & Richardson, G. P. FM1-43 dye 
behaves as a permeant blocker of the hair-cell mechanotransducer channel. J. 
Neurosci. 21, 7013-7025 (2001). 

38. Hu,Z.& Corwin, J.T. Inner ear hair cells produced in vitro by a mesenchymal-to- 
epithelial transition. Proc. Natl Acad. Sci. 104, 16675-16680 (2007). 


©2013 Macmillan Publishers Limited. All rights reserved 


| sid ial Be 


doi:10.1038/nature12362 


Vitamin C induces Tet-dependent DNA 
demethylation and a blastocyst-like state in ES cells 
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DNA methylation is a heritable epigenetic modification involved in 
gene silencing, imprinting, and the suppression of retrotransposons'. 
Global DNA demethylation occurs in the early embryo and the 
germ line**, and may be mediated by Tet (ten eleven translocation) 
enzymes**, which convert 5-methylcytosine (5mC) to 5-hydroxy- 
methylcytosine (ShmC)’. Tet enzymes have been studied extensively 
in mouse embryonic stem (ES) cells*”, which are generally cultured in 
the absence of vitamin C, a potential cofactor for Fe(11) 2-oxoglutarate 
dioxygenase enzymes such as Tet enzymes. Here we report that 
addition of vitamin C to mouse ES cells promotes Tet activity, lead- 
ing to a rapid and global increase in 5hmC. This is followed by 
DNA demethylation of many gene promoters and upregulation 
of demethylated germline genes. Tetl binding is enriched near 
the transcription start site of genes affected by vitamin C treat- 
ment. Importantly, vitamin C, but not other antioxidants, enhances 
the activity of recombinant Tet1 in a biochemical assay, and the 
vitamin-C-induced changes in 5hmC and 5mC are entirely sup- 
pressed in Tet1 and Tet2 double knockout ES cells. Vitamin C has 
a stronger effect on regions that gain methylation in cultured ES 
cells compared to blastocysts, and in vivo are methylated only after 
implantation. In contrast, imprinted regions and intracisternal A 
particle retroelements, which are resistant to demethylation in the 
early embryo””, are resistant to vitamin-C-induced DNA demethy- 
lation. Collectively, the results of this study establish vitamin C as a 
direct regulator of Tet activity and DNA methylation fidelity in ES 
cells. 

ES cells are derived from the inner cell mass (ICM) of the blastocyst 
and can be cultured in vitro to maintain a pluripotent state. Media 
composition has been shown previously to influence ES-cell heterogen- 
eity, gene expression and epigenetic patterns'*. Our study began with 
the serendipitous observation that culture of mouse ES cells in knockout 
serum replacement (KSR) strongly and reversibly induces expression of 
the germline gene Dazl (Supplementary Fig. 1a, b). We performed a 
small-molecule screen and identified vitamin C as the KSR component 
responsible for Dazl induction (Supplementary Fig. 2). Dazl induction 
was also observed with the DNA methyltransferase (Dnmt) inhibitor 
5-azacytidine, suggesting that vitamin C may promote DNA demethy- 
lation. Vitamin C enhances the activity of some Fe(II) 2-oxoglutarate 
dioxygenases'*, and we therefore reasoned that vitamin C could pro- 
mote the activity of Tet enzymes, leading to DNA demethylation and 
Dazl induction. As mouse ES cells are commonly cultured without 
vitamin C, we set out to test the effect of vitamin C on the epigenetic 
and transcriptional state of ES cells. 

Vitamin C treatment of naive ES cells cultured with MEK and 
GSK38 inhibitors (2i) in N2B27-based medium, which is devoid of 
detectable vitamin C (Supplementary Fig. 3), leads to a striking global 


increase in 5hmC by immunofluorescence and dot blot (Fig. 1a, b). In 
contrast, global levels of 5mC were not altered at 12 or 72h after the 
start of vitamin C treatment (Fig. 1b). To assess dynamic 5hmC and 
5mC changes at specific genomic regions, 5omC and 5mC DNA immu- 
noprecipitation followed by deep sequencing (DIP-seq) was performed 
at 12 and 72h after vitamin C treatment. We restricted our analysis to 
methylated regions, as 5mC is a prerequisite for 5hmC. 

Notably, most methylated promoters transiently gain 5hmC at 12h 
and return to baseline levels or below at 72h, whereas 5mC is lost 
progressively at 12 and 72h (Fig. 1c, Supplementary Figs 4a and 5). 
Reduction of 5hmC at 72 h may be explained by loss of 5mC substrate. 
After 72 h of vitamin C treatment methylation is reduced by twofold or 
more in 61% of analysed promoters (Supplementary Table 1). Demethyla- 
tion at exons, introns and intergenic regions is also observed (Supplemen- 
tary Fig. 4b). There is a highly significant overlap in the promoters that 
gain ShmC at 12h and those that lose 5mC at 72h (P<2.2 x 10°, 
Fig. 1d and Supplementary Fig. 6). In addition, ShmC gain and 5mC loss 
occur at the same genomic locations near the transcription start site (TSS) 
(example in Fig. le). These results support a kinetic model in which 
oxidation of 5mC to ShmC precedes DNA demethylation that may occur 
through active or passive mechanisms. We confirmed demethylation at 
the promoters of three representative genes by bisulphite sequencing 
(Fig. 1f). Several high-density CpG promoters show minimal demethy- 
lation and many of these were identified as imprinted genes (Fig. 1g), 
indicating that certain regions of the genome are resistant to vitamin-C- 
induced demethylation. 

Although dot blot analysis indicates that the global rise in 5hmC is 
sustained at 72 h (Fig. 1b), DIP-seq indicates that promoters return to 
baseline ShmC levels at this time (Fig. 1c). This apparent discrepancy 
may be explained by prolonged retention of 5hmC on repetitive ele- 
ments, which cover a large portion of the genome. Indeed, we find that 
intracisternal A particle (IAP) endogenous retroviruses (ERVs) gain 
5hmC at 12h and maintain elevated levels after 72h of vitamin C 
treatment (Fig. 1h). IAP retroelements are also resistant to vitamin- 
C-induced demethylation, as are other repetitive elements, which may 
explain the maintenance of 5mC observed in the dot blot (Fig. 1h, 
Supplementary Fig. 7a, b and Supplementary Table 2). The increase 
in 5hmC at IAP retroelements does not correspond to a loss in 5mC, 
which could occur if only a small fraction of methylated CpGs within 
these ERVs gain 5hmC, resulting in no detectable loss of overall 
methylation by DIP. Indeed, bisulphite sequencing reveals that IAP 
retroelements are not demethylated with vitamin C treatment at 72 h 
(Supplementary Fig. 7c). 

The effects of vitamin C are specific, as several other antioxidants 
tested did not increase global 5hmC (Supplementary Fig. 8). The 
effects of vitamin C are also reversible. The global increase in 5hmC 
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Figure 1 | Vitamin C induces loss of 5mC at gene promoters through a 
transient increase in 5hmC. a, Immunofluorescence for 5hmC. Scale bar, 
200 jum. VitC, vitamin C. b, Global 5hmC and 5mC levels assayed by dot blot 
analysis. c, Graph shows fold change in DIP-seq reads (reads per kilobase per 
million, RPKM) at methylated promoters (n = 1,045) for vitamin-C-treated 
cells relative to untreated cells. Values are mean + s.e.m. d, Overlap of 
methylated promoters that gain 5hmC and those that lose 5mC. P value was 


is lost rapidly after 3 days of vitamin C withdrawal, whereas promoter 
5mC increases gradually after vitamin C removal (Supplementary Fig. 9). 

To determine how vitamin C affects gene expression in ES cells, 
microarray experiments were performed. Only approximately 200 genes 
are changed by more than twofold, and most are upregulated (Fig. 2a 
and Supplementary Table 3), consistent with loss ofa silencing mark like 
5mC. Upregulated genes are enriched on the X chromosome (32.7% 
observed versus 3.8% expected) and for germline gene ontology terms'® 
(Fig. 2b, c). Pluripotency gene expression is not affected (Supplemen- 
tary Table 3) and vitamin C treatment does not impair differentiation 
(Supplementary Fig. 10). Importantly, the expression of Tet and Dnmt 
genes is not affected by vitamin C treatment (Supplementary Fig. 11). 

Germline genes are also induced in ES cells lacking Dnmts, as 
reported previously'’”'’. Out of the 134 vitamin-C-induced genes, 
48 (36%) are also upregulated in Dnumtl “~.Dnmt3a /~:Dnmt3b / 
(Dnmt triple knockout) ES cells, which are devoid of DNA methylation 
(Supplementary Fig. 12a, b). Notably, vitamin C further increases 
expression of a subset of these genes in Dnmt triple knockout ES cells 
(Supplementary Fig. 12b), suggesting that vitamin C may regulate gene 
expression by additional mechanisms. For example, vitamin C may 
also stimulate histone demethylases, as has been shown in induced 
pluripotent stem cell (iPS cell) generation”. 


calculated using Fisher’s exact test. e, Genome browser view of Dazl. 

f, Bisulphite sequencing of promoters. Open circles, unmethylated; closed 
circles, methylated. g, Scatter plot of methylated promoters comparing change 
in methylation (Z score) with CpG content. Red circles, imprinted genes. 

h, Graphs show fold change in RPKM at retrotransposons for vitamin-C- 
treated cells relative to untreated cells. Values are mean + s.e.m. 


Genes upregulated by vitamin C have higher basal levels of promoter 
methylation in untreated cells (Fig. 2d). When analysis is restricted to 
germline genes (associated with the gene ontology term ‘reproduction’), 
genes upregulated by vitamin C show even higher basal levels of pro- 
moter methylation (Fig. 2d). Furthermore, upregulated genes, particu- 
larly upregulated germline genes, show significant loss of methylation 
(Fig. 2e). Taken together, these results indicate that widespread pro- 
moter demethylation induced by vitamin C promotes the upregula- 
tion of predominantly germ line-associated genes. However, promoter 
demethylation is not sufficient to induce expression of most methylated 
genes, possibly owing to redundant epigenetic silencing mechanisms or 
a lack of activating transcription factors. 

As Tetl, Tet2 and Tet3 are the only known enzymes that oxidize 
5mC to 5hmC, we reasoned that the effects of vitamin C would be 
mediated by Tet enzymes. Indeed, vitamin C, but not other antioxi- 
dants such as glutathione or dithiothreitol (DTT), increases recom- 
binant Tetl activity in a dose-dependent manner in a biochemical 
assay (Fig. 3a). ES cells express two Tet family members, Tetl and 
Tet2, and these enzymes seem to be highly redundant*”’. Tet1 bind- 
ing’ is enriched near the TSS of promoters that gain 5hmC or lose 5mC 
with vitamin C treatment (Supplementary Fig. 13). To test whether the 
effects of vitamin C are Tet-dependent, we analysed Tet1 ~~. Tet2/— 
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Figure 2 | Vitamin-C-induced DNA demethylation leads to expression of 
germline genes. a, Number of genes differentially expressed after vitamin C 
treatment (twofold and P< 0.05 by t-test). b, Chromosomal distribution of 
upregulated genes. c, Gene ontology analysis of upregulated genes. d, Box plot 
showing basal promoter methylation levels (RPKM) in untreated ES cells for all 
genes on the microarray (n = 18,023), upregulated genes (n = 102), 
downregulated genes (n = 48), all germline genes (n = 865), upregulated 
germline genes (n = 8), and downregulated germline genes (n = 3). e, Box plot 
showing the extent of vitamin-C-induced demethylation (Z score) at gene 
promoters categorized as in d. The box plots have Tukey whiskers, a line for the 
median, and edges for the 25th and 75th percentiles. ****P < 0.0001 by 
analysis of variance (ANOVA) throughout the figure. 


(Tet double knockout) ES cells*’. Dot blot analysis reveals that Tet 
double knockout ES cells show greatly reduced 5hmC signal that is 
not increased after vitamin C treatment (Fig. 3b). Residual signal in 
Tet double knockout ES cells may be due to antibody background or 
low-level Tet3 expression. Importantly, DIP followed by quantitative 
polymerase chain reaction (DIP-qPCR) reveals that in contrast to wild- 
type cells, vitamin C treatment of Tet double knockout ES cells does not 
affect SamC or 5mC levels at gene promoters (Fig. 3c). Furthermore, 
vitamin-C-induced gene expression is significantly attenuated in Tet 
double knockout ES cells (Fig. 3d). The modest gene induction 
observed may be due to effects of vitamin C unrelated to DNA methy- 
lation, as already suggested by the analysis of Dnmt triple knockout ES 
cells (Supplementary Fig. 12b). Tet1 ~’~ (Tet1 knockout) ES cells” also 
show an attenuated increase in global 5hmC, reduced promoter 
demethylation, and reduced gene induction in response to vitamin C. 
However, these effects are more subtle than in the Tet double knockout 
ES cells (Supplementary Fig. 14). These data indicate that the effects of 
vitamin C are Tet-dependent and are mediated by both Tet and Tet2. 

Recent studies highlight differences in the methylomes of ES cells 
and blastocysts, with ES cells showing higher levels of methylation’. 
Using published genome-wide bisulphite sequencing data for ES cells 
and blastocysts"’, we investigated the relationship between vitamin-C- 
induced demethylation and differences in methylation between ES 
cells and the blastocyst. Analysis was performed on promoter CpG 
islands (CGIs) methylated in both our study and ref. 13 (Fig. 4a and 
Supplementary Table 4), which generally show greater methylation in 
ES cells than blastocysts (Fig. 4b). Interestingly, vitamin C induces 
greater demethylation at CGIs that are hypermethylated in ES cells 
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Figure 3 | The effects of vitamin C are Tet-dependent. a, Dose-dependent 
effect of vitamin C on in vitro Tet activity (n = 3 technical replicates, values are 
mean + s.d.). b, Dot blot analysis for 5hmC after 12h vitamin C treatment. 
DKO, double knockout; WT, wild-type. ¢, 5hmC (top) and 5mC (bottom) DIP- 
qPCR after 12 or 72 h vitamin C treatment, respectively. An intergenic region 
on chromosome 8 (Int8) is included as a negative control. d, Gene expression at 
72h of vitamin C treatment. Nanog, whose expression is not expected to 
change, is included as a control. (RT-PCR data expressed relative to untreated 
wild-type cells. For c and d, n = 2 biological replicates, values are mean + s.e.m. 
*P < 0.05, **P < 0.01, ***P < 0.001 by t-test throughout the figure. 


relative to blastocysts (Fig. 4c, d). Conversely, vitamin C has modest 
effects on CGIs with similar methylation levels in ES cells and blas- 
tocysts, such as imprinted regions (Fig. 4c, d). IAP ERVs, which are 
similarly methylated in both ES cells and the blastocyst, are also res- 
istant to vitamin-C-induced demethylation in ES cells (Supplementary 
Fig. 7d). These findings suggest that the effects of vitamin C are most 
pronounced at genes that show hypermethylation in ES cells compared 
to blastocysts. 

Next, methylation dynamics during development were determined 
using a reduced representation bisulphite sequencing (RRBS) data set*. 
Gene promoters that show 5mC loss with vitamin C are generally 
unmethylated up to the ICM stage and then undergo extensive methy- 
lation at the epiblast stage (Fig. 4e, blue). In contrast, gene promoters 
that show 5mC maintenance with vitamin C are enriched for germ line 
differentially methylated regions (DMRs) that maintain approxi- 
mately 50% methylation in somatic tissues throughout development 
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Figure 4 | Vitamin C reduces DNA methylation in ES cells that is normally 
gained post-implantation. a, Overlap of methylated CGls in ES cells from this 
study (SmC DIP-seq, RPKM > 0.5) and ref. 13 (whole-genome bisulphite 
sequencing (Bis-seq), >25% methylated CpGs). Only CGIs found to be 
methylated in both data sets were used for subsequent analysis in b-d. b, The box 
plot shows CGI methylation levels in ES cells and blastocysts (values from ref. 13, 
*#**D < ().0001 by t-test). c, CGIs were first categorized as having 5mC loss with 
vitamin C (>75% loss of 5mC at 72h, n = 23) or 5mC maintenance with 
vitamin C (<25% loss of 5mC at 72h, = 14) in ES cells, then plotted for 
difference in methylation between untreated ES cells and blastocysts. CGIs 
demethylated upon vitamin C treatment show significantly greater ES cell 
hypermethylation compared to CGIs resistant to vitamin C (**P < 0.01 by t-test). 
d, Genome browser views of a gene from each category described in 

c.e, Methylation levels during development of all genes categorized in c for which 
data exist (see ref. 2). f, Gene expression in ES cells cultured with or without 
vitamin C compared to embryonic day 3.5 (E3.5) blastocysts. Data are expressed 
relative to untreated ES cells (n = 2 biological replicates, values are mean + s.e.m). 


(Fig. 4e, grey). Thus, it seems that in ES cells cultured in the absence of 
vitamin C, methylation accumulates at the subset of CGIs that would 
normally be de novo methylated in the epiblast. Analysis of published 
RNA-seq data” indicates that several vitamin-C-induced genes are 
also expressed in the ICM (Supplementary Table 5). Furthermore, 
we find that several of these germline genes are indeed expressed in 
the blastocyst at levels comparable to ES cells treated with vitamin C 
(Fig. 4f). Collectively, these findings suggest that vitamin C remodels 
DNA methylation and expression patterns in ES cells, resulting in a 
state reminiscent of the ICM of the blastocyst. 
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Recently, it was reported that long-term culture of ES cells in 2i 
medium induces a blastocyst-like state of global hypomethylation 
relative to culture in serum”. As the analyses described above were 
carried out in 2i medium, they document effects of vitamin C beyond 
those of 2i. Nevertheless, we sought to distinguish the effects of 2i and 
vitamin C. We find that vitamin C induces a gain of 5hmC, loss of 
5mC, and induction of germline genes in both FBS and 2i medium 
(Supplementary Fig. 15a—d). In contrast, culture in 2i medium alone 
shows little to no effect over the same 72-h time course. The faster 
kinetics of action of vitamin C relative to 2i are probably due to their 
different mechanisms of action: vitamin C promotes Tet-mediated 
DNA demethylation, whereas 2i promotes passive loss of DNA methy- 
lation via upregulation of Prdm14 and repression of Dnmt3b and 
Dnmt3l (ref. 24) (Supplementary Fig. 15e). 

In summary, this work demonstrates that vitamin C alters the 
steady-state of DNA methylation and, in turn, the expression of germ- 
line genes in ES cells by enhancing Tet activity. Vitamin C reduces 
methylation at CGIs that normally gain methylation during the blas- 
tocyst to epiblast transition, promoting an ICM-like DNA methylation 
state in ES cells. Intriguingly, although human ES cells are normally 
cultured in medium containing vitamin C, they resemble mouse epi- 
blast cells more than ICM cells. Nevertheless, vitamin C may also have 
a role in human ES cells, as it has been shown that they accumulate 
DNA methylation after several passages in the absence of vitamin C, 
although the underlying mechanisms were not addressed”. Notably, 
imprinted regions and IAP retroelements, which are resistant to DNA 
demethylation in the early embryo”, are also resistant to vitamin-C- 
mediated demethylation. These regions show high levels of H3K9me3 
(ref. 18), suggesting that this mark, or readers of this mark, may have a 
role in protecting against Tet-mediated demethylation. Vitamin C also 
improves the quality of iPS cells by preserving the fidelity of DNA 
methylation at imprinted regions”*. Furthermore, Tets are required for 
methylation reprogramming during iPS cell generation and in the 
zygote**””®, Vitamin C has also been reported to increase 5hmC in 
mouse embryonic fibroblasts”, suggesting that the mechanism char- 
acterized here may be broadly applicable to other cell types. Much 
work remains to be carried out to evaluate the ability of vitamin C to 
modulate Tet activity and DNA methylation in vivo. It will be of 
interest to investigate the role of vitamin C in contexts in which Tet 
enzymes have been implicated*®, such as in the zygote, germ line, blood 
and brain. Potential roles for vitamin C in the clinic, including in in 
vitro fertilization culture medium or in cancers driven by aberrant 
DNA methylation, also deserve exploration. 


METHODS SUMMARY 


ES cells were cultured in feeder-free conditions in 2i medium. Vitamin C (L-ascorbic 
acid 2-phosphate, Sigma) was added daily at 100 jig ml _'. The Tet activity assay was 
performed using recombinant human Tet! catalytic domain with 5hmC generation 
quantified by enzyme-linked immunosorbent assay (ELISA). Dot blot analyses 
were performed with serial dilutions of DNA using a Bio-Dot (Bio-Rad) apparatus 
and antibodies against SamC or 5mC. For gene expression analysis, total RNA was 
isolated and hybridized to Affymetrix mouse gene 1.0 ST GeneChip arrays or 
analysed by qPCR with reverse transcription (qRT-PCR). For 5mC DIP-seq and 
5hmC DIP-seq, immunoprecipitated DNA was adaptor-ligated for paired-end 
sequencing on an Illumina HiSeq following the manufacturer’s recommended 
protocol. Sequence reads were aligned to the mm9 mouse reference genome and 
unique reads were used to calculate RPKM values in various regions including 
RefSeq promoters and CGls. For pair-wise sample comparisons, an empirical Z 
score was calculated assuming the distribution of RPKMs for each sample followed 
a Poisson model. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 

Cell culture. ES cells used in this study include: Oct4-GiP (129 X MF1), V6.5 
(129/Sv x C57/BL6), Tetl’’, Tetl’’~ Tet2-’~, Jl (129/Sv), and Dnamtl-’~ 
Dnmt3a‘~ Dnmt3b~/~. All ES cells were cultured in feeder-free conditions on 
0.1% gelatin-coated tissue culture plates. Unless indicated otherwise, ES cells were 
cultured in 2i medium, which is composed ofa N2B27 base medium” supplemented 
with the MEK (MAP-kinase kinase) inhibitor, PD0325901 (11M, Stemgent), the 
GSK3B inhibitor, CHIR99021 (3 1M, Selleck Chemicals), and with ESGRO leuk- 
aemia inhibitory factor (LIF) at 1,000 units per ml (Millipore). Vitamin C (L-ascorbic 
acid 2-phosphate, Sigma, A8960) was added on day 1 after seeding at 100 pg ml. 
Medium was replaced daily. For KSR versus FBS studies, a base medium composed 
of high glucose DMEM (Life Technologies), L-glutamine (2 mM, Life Technologies), 
sodium pyruvate (1 mM, Life Technologies), non-essential amino acids (1X, Life 
Technologies), 2-mercaptoethanol (1X, Millipore), and penicillin-streptomycin 
(1X, Life Technologies) was supplemented with LIF (1,000 units per ml) and either 
15% KSR (KSR medium) or 15% FBS (FBS medium). For differentiation experi- 
ments, Oct4-GiP ES cells maintained in 2i medium were treated with vitamin C for 
72h. Untreated and vitamin-C-treated ES cells were then transferred to 60-mm Petri 
dishes and cultured in suspension in FBS medium without LIF to induce embryoid 
body formation. Medium was replaced on day 3 and embryoid bodies were collected 
on day 5 of differentiation. To compare the effects of vitamin C in FBS medium 
versus 2i medium, J1 ES cells maintained in FBS medium were switched to FBS 
medium plus vitamin C, 2i medium, or 2i medium plus vitamin C and collected at 
12 and 72h after changing conditions. Media were replaced daily. 

Small molecule screen. Oct4-GiP ES cells were cultured in FBS medium. Small 
molecules were added at the time of seeding. The small molecules used included 
valproic acid (2mM, Calbiochem), trichostatin A (20nM, Sigma), PD0325901 
(1M, Stemgent), CHIR99021 (31M, Selleck Chemicals), forskolin (10 uM, 
Sigma), A-83-01 (11M, Stemgent), RG108 (250nM, Stemgent), 5-azacytidine 
(1 uM, Sigma), BIX01294 (0.5 1M, Stemgent), UNC0638 (0.5 tM, Sigma), insulin 
(Sugml’, Sigma), vitaminC (100pgml~*, Sigma) and the lipid cocktail 
Albumax II (5 mg ml’, Life Technologies). The KSR components tested include 
vitamin C, Albumax II, and insulin (international patent application WO 98/ 
30679). 

Vitamin C quantification assay. The amount of vitamin C in 2i medium was 
determined using an Ascorbic Acid Assay Kit (Abcam, ab65656). Known amounts 
of vitamin C (L-ascorbic acid, Sigma, A4544) were added to the medium and tested 
as controls. 

Antioxidant treatment. Oct4-GiP ES cells were cultured in 2i medium and treated 
for 24h with antioxidants. The antioxidants tested included a modified glu- 
tathione, glutathione reduced ethyl ester (GMEE), at 1.5 ug ml~ | sodium selenite 
(20 nM), vitamin B1 (9 pg ml‘), vitamin E (25 uM), L-carnitine hydrochloride 
(15 pg ml‘), and a-lipoic acid (5 ug ml”). All reagents were purchased from 
Sigma. After treatment, DNA was isolated and assayed by dot blot analysis. 
Immunofluorescence staining. Cells were fixed in 4% paraformaldehyde for 
15min. After washing three times with PBS, cells were blocked with 5% FBS in 
PBST (PBS + 0.5% Tween 20) for 2h at room temperature (20-25 °C). Primary 
antibodies were diluted in blocking solution and incubated with cells overnight at 
4°C. Cells were then washed three times in PBS and incubated for 2h at room 
temperature with secondary antibodies diluted in blocking solution. Cells were 
washed three times in PBS before imaging. Primary antibodies included DAZL 
(1:200, Abcam) and 5-hydroxymethylcytosine (1:100, Active Motif). 594-conjugated 
chicken anti-rabbit (1:1,000, Life Technologies) was used as a secondary antibody. 
Genomic DNA preparation. DNA was isolated using a Qiagen Gentra Puregene 
Kit or the phenol-chloroform-isoamyl alcohol method. RNase A digestion was 
included in the isolation procedure. 

Dot blot analysis. Isolated DNA (1 jg per sample) was denatured in 0.1 M NaOH 
for 10 min at 95 °C. Samples were neutralized with 1 M NH,OAc on ice, and then 
serially diluted twofold. DNA samples were spotted on a nitrocellulose membrane 
using a Bio-Dot apparatus (Bio-Rad). The blotted membrane was washed in 2 
SSC buffer, dried at 80°C for 5 min, and UV cross-linked at 120,000 uJ cm 7. 
The membrane was then blocked in Odyssey buffer (Li-Cor) diluted 1:1 in 
PBS (Odyssey:PBS) overnight at 4°C. Mouse anti-5-methylcytosine monoclonal 
antibody (Active Motif, 1:500) or rabbit anti-5-hydroxymethylcytosine polyclonal 
antibody (Active Motif, 1:5,000) in Odyssey:PBS was added for 3h at room tem- 
perature. The membrane was washed for 10 min three times in PBS, and then 
incubated with either HRP-conjugated sheep anti-mouse immunoglobulin-G 
(IgG) (GE, 1:10,000) or HRP-conjugated goat anti-Rabbit IgG (Abcam, 
1:10,000) secondary antibodies in Odyssey:PBS for 3h at room temperature. 
The membrane was then washed for 10 min three times in PBS and visualized 
by chemiluminescence with GE ECL Plus. 

qRT-PCR. Total RNA was isolated from cultured cells using Qiagen RNeasy with 
on-column DNase I treatment. Complementary DNA (cDNA) was generated 
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from 1g of RNA using random hexamers to prime the reaction. The cDNA 
was used as template for qRT-PCR. qRT-PCR was performed in combination 
with the KAPA SYBR Fast ABI Prism qPCR kit on an Applied BioSystems 7900HT 
sequence detection system. Primer sequences are listed in Supplementary Table 6. 
The relative amount of each gene was normalized using two housekeeping genes 
(L7 (also known as Rp!7) and Ubb), unless otherwise specified. 

Blastocyst expression analysis. C57BL/6 (Simonsen) female mice were injected 
with 7.51U of pregnant mare gonadotropin (HUMC-NHPP) followed by 7.51U 
human chorionic gonadotropin (Sigma) 46h later. Primed females were mated 
with DBA/2 (Simonsen) male mice. Detection of a vaginal plug was defined as 
0.5 days post coitum. Embryonic day 3.5 blastocysts were obtained by flushing the 
uterus of superovulated females with M2 medium (Sigma). Total RNA was iso- 
lated from collected blastocyst using a RNeasy Micro Kit (Qiagen) with on-column 
DNase I treatment. RT-PCR was performed as described above. The relative 
amount of each gene was normalized to L7. All animal work was conducted in 
accordance with protocols approved by the Institutional Animal Care and Use 
Committee at the University of California, San Francisco. 

Microarray analysis. Total RNA was isolated from biological triplicates of Oct4- 
GiP ES cells treated with or without vitaminC for 72h and hybridized to 
Affymetrix mouse gene 1.0 ST GeneChip arrays (Affymetrix). Only annotated 
probes were considered for analysis. DChip software was used for gene expression 
statistical analysis. 

Gene ontology analysis. Gene ontology functional annotation was performed in 
DAVID. 

In vitro recombinant Tet1 activity assay. Recombinant human TET] catalytic 
domain (1.2 1g) was incubated for 10 min at 37 °C with 178.2 ng of biotinylated 
DNA substrate in 50 mM HEPES (pH 8.0), 50 mM NaCl, 1 mM «-ketoglutarate, 
3.7 1M ammonium iron (11) sulphate hexahydrate, 0.1 mgml~' BSA, 1mM ATP 
and titrating concentrations of vitamin C (Sigma, A5960), DTT (Sigma, D0632) or 
reduced glutathione (Sigma, G6529). The reaction was quenched by adding EDTA 
(11 mM). Each sample was loaded into a 384-well streptavidin coated plate in 
triplicates (25-11 well) and left overnight at 4°C on a rotating platform. The next 
morning the wells were washed three times with 90 pl of 1X Tris-buffered saline 
(TBS) with 0.1% Tween (TBST). The wells were then incubated with 50 ul per well 
of anti-ShmC antibody (Active Motif) diluted 1:5,000 in TBST with 5% milk for 
1h at room temperature on a rotating platform. Next the wells were washed three 
times with 90 ul of TBST, followed by incubation with 50 ll of goat anti-rabbit 
peroxidase diluted 1:3,000 in TBST with 5% milk for 1h on a rotating platform. 
The wells were then washed three times with TBST, and 30 pl of TMB substrate 
reagent mix (BD Biosciences) were added to each well. The reaction proceeded in 
the dark for 10 to 12 min, and was then quenched with 20 ul of 25% sulphuric acid. 
The absorbance of each well was measured at 450 nm. A standard curve of twofold 
dilutions from a fully hydroxymethylated version of the DNA substrate was used 
to obtain a linear regression, which was used to calculate pmols of SamC formed in 
the reaction from absorbance values. The data are presented as fold change in 
5hmC relative to untreated. 

Bisulphite Sanger sequencing. Oct4-GiP ES cells treated with or without 
vitamin C for 72 h were analysed for promoter methylation. To analyse the methy- 
lation status of the selected genes, 0.2 tig of genomic DNA was subject to sodium 
bisulphite conversion using the EZ DNA Methylation-Gold kit (Zymo Research). 
Primers specific for the genes analysed (Supplementary Table 6) were employed in 
nested or semi-nested PCR reactions. PCR products were cloned through TA 
cloning using the pGEM-T easy kit (Promega) and individual inserts were 
sequenced using Genewiz Sequencing. Data were analysed using Quantification 
Tool for Methylation Analysis*’. The percentage of methylated CpGs sequenced is 
presented for each set of samples. 

5mC and 5hmC DIP-qPCR. DIP was performed using the Diagenode 
MagMeDIP or hMeDIP Kit with minor modifications. DNA was sonicated into 
short fragments (100 to 1,000 base pairs (bp)) with a Diagenode Bioruptor for 
20min with 15-s on, 15-s off cycles at low power. Sonicated DNA was heat 
denatured at 95°C for 10 min. Sonicated DNA (1 ig) was immunoprecipitated 
with 1 ug of mouse anti-5-methylcytosine monoclonal antibody (Active Motif, 
1 pg pl!) or 2.5 pg of mouse anti-5-hydroxymethylcytosine monoclonal antibody 
(Diagenode, 1 jig pl *). After a 2-h incubation at 4°C, Magbeads (Diagenode) 
were added to the DNA-antibody mixture and samples were incubated at 4 °C 
overnight. Isolation of immunoprecipitated DNA was performed according to the 
kit instructions. qPCR was performed in combination with the KAPA SYBR Fast 
ABI Prism qPCR kit on a Applied BioSystems 7900HT Sequence Detection 
System. Primer sequences are listed in Supplementary Table 6. 

5hmC and 5mC DIP-seq. Oct4-GiP ES cells were treated with vitamin C (100 tg ml 1 
for 12 and 72h. For each sample, 12 t1g of genomic DNA was isolated, split into 
three replicates of 4 1g each and sonicated to approximately 100 to 500 bp on a 
Covaris E210 platform (75s, 10% duty cycle). Sheared DNA was end-repaired, 
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A-tailed, and ligated to custom paired-end adapters as described**. Ligated geno- 
mic DNA was size selected (100 to 300 bp) by 8% PAGE (Nuvex, Invitrogen) to 
remove unligated adapters. Replicates were pooled and subjected to qPCR using 
truncated PE1.0/2.0 PCR primers to assess ligation efficiency and quantified by 
fluorometer (Qubit, Life Technologies). Adaptor-ligated DNA was heat denatured 
at 95°C for 10 min, rapidly cooled on ice, and immunoprecipitated with a mouse 
monoclonal anti-methylcytidine antibody (1mgml~', Eurogentec) or a mouse 
monoclonal anti-5-hydroxymethylcytidine (ShmC) antibody (1.6 mg ml" ', Diage- 
node). Primary antibodies were added at 1 lpg ' of DNA and samples were 
incubated overnight at 4°C with rocking agitation in 500 pl IP buffer (10mM 
sodium phosphate buffer, pH 7.0, 140 mM NaCl, 0.05% Triton X-100). To recover 
the immunoabsorbed DNA fragments, 1 pl of rabbit anti-mouse IgG secondary 
antibody (2.5 mg ml ', Jackson ImmunoResearch) and 100 ull Protein A/G beads 
(Pierce Biotechnology) were added and incubated for 2h at 4°C with agitation. 
After immunoprecipitation beads were resuspended in TE buffer with 0.25% SDS 
and 0.25 mg ml ' proteinase K for 2h at 55°C and then allowed to cool to room 
temperature. Immunoprecipitated and supernatant DNA were purified using 
Qiagen MinElute columns and eluted in 16 pil EB (Qiagen). Sequencing libraries 
were generated by 10 to 15 cycles of PCR using custom indexed paired-end Illumina 
PCR primers”. The resulting reactions were purified over Qiagen MinElute col- 
umns, after which a final size selection was performed by electrophoresis in 8% 
PAGE. Libraries were quality controlled by spectrophotometry and Agilent DNA 
Bioanalyzer analysis. An aliquot of each indexed library was pooled by sample and 
sequenced two per lane on an Illumina HiSeq2000 platform (2X76 nt + 7 nt) 
following the manufacturer’s recommended protocol and V3 chemistry (Illumina). 
Bioinformatics. Sequence reads (75 bp paired-end) were aligned to the mouse 
reference genome (mm9) using BWA v0.5.9 (ref. 35) and default parameters. 
Samtools* and Picard (http://picard.sourceforge.net/) were used to sort and mark 
duplicate reads respectively. Reads having identical coordinates were collapsed 
into a single read and reads with mapping qualities = 5 passed to FindPeaks 4.01 
(ref. 37) for generation of unthresholded and thresholded (false discovery rate 
(FDR) < 0.01) coverage wig files to be visualized in the UCSC genome browser™. 
To quantify the strength of 54mC and 5mC DIP-seq marks, these wig files were 
used to calculate reads per kilobase per million reads (RPKM)*?” values in various 
regions of interest including RefSeq"! promoters and CGIs (downloaded from 
http://genome.ucsc.edu on May 15, 2012). For pair-wise sample comparisons, 
an empirical Z score was calculated assuming the distribution of RPKMs for each 
sample followed a Poisson model: Z score = (RPKM, - RPKMg)/\(RPKMa - 
rapRPKMg), where RPKM, and RPKMg are RPKMs in the region of interest of 
A and B samples, respectively, and rag = Na/Ng, where Nx is the total number of 
aligned reads used for normalization. For promoter analysis, promoters were 
defined as +500 bp from the TSS. Methylated promoters were defined as having 
>0.5 RPKM values in both 12 and 72 h untreated 5mC DIP-seq samples 


(n = 1,045). For previously published Tetl ChIP-seq (chromatin immunopreci- 
pitation followed by sequencing) data (ref. 10), short read data were downloaded 
from the NCBI GEO archive (GSE24843) and remapped to mm9 (NCBI 37). 
ChIP-seq reads with identical coordinates were collapsed into a single read. 
Tetl reads were directionally extended by 300 bp using FindPeaks and unthre- 
sholded coverage wig files were generated to create Tet heatmaps in methylated 
promoters using ChAsE”, an interactive analysis and exploration tool for epige- 
netic data. Tet] ChIP-seq data was corrected by control data. To determine which 
retroelement subfamilies show global 5hmC and 5mC DIP-seq changes in vit- 
amin-C-treated compared to untreated ES cells, RPKM values were calculated for 
all subfamilies in each 5hmC and 5mC DIP-seq data set. To generate RPKM 
values, we calculated the total number of reads aligned to each retroelement 
subfamily using both unique and multiple-aligned reads, and normalized the total 
number of reads by total genomic bp per subfamily and total number of reads for 
each data set. For whole genome bisulphite-seq analysis of CGIs, the percentage of 
methylation for each individual cytosine was downloaded from ref. 13 (http:// 
www.nodai-genome.org/mouse_en.html). CGI methylation levels were calculated 
as the average of the percentage of methylation for all cytosines in each CGI region. 
Analysis was restricted to CGIs + 1,000 bp from a TSS. 
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Molecular basis of binding between novel human 
coronavirus MERS-CoV and its receptor CD26 


Guangwen Lu'*, Yawei Hu”*, Qihui Wang'*, Jianxun Qi'*, Feng Gao***, Yan Li’, Yanfang Zhang'®, Wei Zhang', Yuan Yuan"®, 


Jinku Bao*, Buchang Zhang’, Yi Shi’, Jinghua Yan' & George F. Gao 


The newly emergent Middle East respiratory syndrome coronavirus 
(MERS-CoV) can cause severe pulmonary disease in humans’”, repre- 
senting the second example of a highly pathogenic coronavirus, the 
first being SARS-CoV’. CD26 (also known as dipeptidyl peptidase 
4, DPP4) was recently identified as the cellular receptor for MERS- 
CoV*. The engagement of the MERS-CoV spike protein with CD26 
mediates viral attachment to host cells and virus-cell fusion, thereby 
initiating infection. Here we delineate the molecular basis of this 
specific interaction by presenting the first crystal structures of both 
the free receptor binding domain (RBD) of the MERS-CoV spike 
protein and its complex with CD26. Furthermore, binding between 
the RBD and CD26 is measured using real-time surface plasmon 
resonance with a dissociation constant of 16.7 nM. The viral RBD 
is composed of a core subdomain homologous to that of the SARS- 
CoV spike protein, and a unique strand-dominated external receptor 
binding motif that recognizes blades IV and V of the CD26 f-propeller. 
The atomic details at the interface between the two binding entities 
reveal a surprising protein-protein contact mediated mainly by 
hydrophilic residues. Sequence alignment indicates, among beta- 
coronaviruses, a possible structural conservation for the region 
homologous to the MERS-CoV RBD core, but a high variation in 
the external receptor binding motif region for virus-specific patho- 
genesis such as receptor recognition. 

The recent identification of a novel coronavirus, MERS-CoV— 
which, as of May 15th 2013, had infected 40 patients with a total of 
20 fatalities—has drawn worldwide attention as a potential cause of a 
future pandemic. Unlike most coronaviruses circulating in humans 
that only cause mild respiratory illness®, MERS-CoV possibly repre- 
sents a second reported coronavirus of severely high virulence after 
SARS-CoV, which caused over 8,000 infection cases globally in 2003, 
with more than 800 deaths’. The clinical manifestations of MERS-CoV 
infection include fever, cough, acute respiratory distress syndrome 
and, in some cases, accompanying renal failure’, and are very similar 
to those caused by SARS-CoV. However, the novel coronavirus diverges 
from SARS-CoV in genomic sequence, and is much more closely related 
to the bat-derived HKU4 and HKU5 coronaviruses”*. Consistent with 
phylogenetic analysis, MERS-CoV does not use the SARS-CoV receptor, 
angiotensin converting enzyme 2 (ACE2), as its entry receptor’; rather, a 
recent study showed that it uses human CD26 for this purpose*. CD26 is 
the third peptidase to be identified as a functional coronavirus receptor, 
the others being aminopeptidase N (ANPEP, also known as APN and 
CD13)'!°" and ACE2 (ref. 12). 

The recognition of CD26 by MERS-CoV is mediated by virus surface 
spike (S) protein*. As with other coronaviruses, the MERS-CoV S pro- 
tein would be cleaved in host cells into $1 and S2 subunits (Fig. 1a). S1 
engages the receptor* whereas S2, with typical sequence motifs homo- 
logous to those identified as the heptad repeats in class I enveloped 


1,5,6,7,8 


viruses'*"'°, should mediate membrane fusion. The exploitation of the 


virus-receptor interaction and thus of the intervention strategies 
requires an atomic delineation of the receptor-binding properties of 
S1. On the basis of previous studies, the receptor attachment sites of 
coronavirus S1 subunits might locate to either the amino-terminal 
(such as in murine hepatitis virus'®) or the carboxy-terminal (such as 
in, for example, SARS-CoV” and human coronavirus NL63 (ref. 18)) 
domain. We therefore tested individually the binding of MERS-CoV S1 
and its N- and C-terminal-domain proteins to cell-surface-expressed 
CD26 molecules. The receptor-binding capacity was attributed to the 
C-terminal amino acids 367-606 of MERS-CoV S1 (Fig. 1b). We hereby 
referred to this domain as RBD. The potent interaction between MERS- 
CoV RBD and CD26 was further demonstrated by surface plasmon 
resonance assays, in which CD26 binds to MERS-CoV RBD with a 
dissociation constant (Kg) of about 16.7 nM (Kon; 1.79 X 10° M's * 
Kops 2.99 X 10°?s— 3), but does not bind to the RBD of SARS-CoV 
(Fig. 1c). 

We crystallized MERS-CoV RBD and solved its structure at a reso- 
lution of 2.5 A (Supplementary Table 1). Two molecules of essentially 
the same structure are present in the asymmetric unit. Each molecule 
contains 208 consecutive density-traceable amino acids from V381 to 
L588. A Dali’® search within the Protein Data Bank (PDB) revealed 
clear structural homology between MERS-CoV RBD and SARS-CoV 
RBD (PDB code, 2DD8; Z score, 15.1). We therefore divided the MERS- 
CoV RBD structure into two subdomains: a core and an external 
B-sheet, using the structure of SARS-CoV RBD asa reference. The core 
subdomain reveals a five-stranded antiparallel B-sheet (61, B3, B4, B5 
and £10) in the centre. The connecting helices (four «-helices: 71-4 and 
two 3,o-helices: n1 and 12) and two small B-strands (62 and B11) 
further decorate the sheet on both sides, together forming a globular 
fold. Three disulphide bonds, connecting C383 to C407, C425 to C478, 
and C437 to C585, respectively, stabilize the core-domain structure 
from the interior. At the solvent-exposed side, the RBD termini are 
clinched adjacent to each other (Fig. 2a, b). This subdomain fold is very 
similar to that of the SARS-CoV RBD core (a root mean squared devi- 
ation of 2.79 A for 76 Cz. pairs). Superimposition of the two structures 
reveals a well-aligned centre sheet and homologous peripheral helices 
and strands, although several intervening loops are observed to exhibit 
large conformational variance (Fig. 2c). 

The external subdomain of MERS-CoV RBD is mainly a f-sheet 
structure with three large (B6, B8 and B9) and one small (87) strand 
arranged in an antiparallel manner. It is anchored to the RBD core 
through the 85/6, B7/8 and B9/10 intervening loops, which touch the 
core subdomain like a clamp at both the top and bottom positions. 
Two small 319 helices (3 and 14) and most of the connecting loops in 
this subdomain locate on the interior side of the sheet, hence exposing 
a flat exterior sheet-face to the solvent. Residues C503 and C526 form 
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Figure 1 | Identification of the MERS-CoV RBD. a, A schematic 
representation of the MERS-CoV S protein. The N-terminal domain (NTD) 
and RBD are defined on the basis of a pairwise sequence alignment with the 
N-terminal galectin-like domain of murine hepatitis virus S and the RBD of 
SARS-CoV S, respectively. The remaining domain elements are 
bioinformatically defined on the basis of the web-server predictions (signal 
peptide (SP), SignalP 4.0 server; transmembrane domain (TM), TMHMM 
server; heptad repeats 1 and 2 (HR1 and HR2), Learncoil-VMF program). 

? denotes the presumed/estimated S1/S2 cleavage site. A previous prediction* 
indicates cleavage between R751 and S752 with a 602-residue $2. However, a 
recent study** revealed a spike C-terminal domain (possibly S2) of ~100 kDa, 
indicating a cleavage site upstream of R751/S752. b, A flow cytometric assay of 


the fourth disulphide bond, linking the 13 helix to strand [6 (Fig. 2a, b). 
With no observable structure homology (Fig. 2c), the external subdo- 
mains of MERS-CoV and SARS-CoV RBDs are topological equivalents, 
both being present as an ‘insertion’ between the equivalent core-strands 
(strands B5 and B10 in MERS-CoV, and 6 and B9 in SARS-CoV) 
(Supplementary Fig. 1). 

To elucidate the structural basis of the virus—receptor engagement, 
we further prepared the RBD-CD26 complex by in vitro mixture of the 
two proteins and then purification on a gel filtration column. Con- 
sistent with the high binding affinity between MERS-CoV RBD and 
CD26, the complex is easily obtainable and stable (Supplementary Fig. 2). 
The complex structure was solved at 2.7 A resolution (Supplementary 
Table 1) with one RBD binding to a single CD26 molecule in the asy- 
mmetric unit. The receptor, as shown in previous reports’, is com- 
posed of an eight-bladed B-propeller domain and an «/B hydrolase 
domain. MERS-CoV RBD binds to the side-surface of the CD26 f- 
propeller, recognizing blades IV and V and a small bulged helix in the 
blade-linker. As for the viral ligand, the entire receptor binding site 
locates in the external subdomain and to the solvent-exposed sheet- 
face, qualifying the subdomain as the receptor binding motif (RBM) 
(Fig. 3a). Overall, engagement of the receptor does not induce obvious 
conformational changes in RBM, although small structural variance 
could be observed for the tip-loops. The 72-04 loop in the RBD core, 
however, unexpectedly exhibits a large conformational difference 
between the free and the bounded structures (Supplementary Fig. 3). 
We believe this is due to a crystal contact present in the free RBD struc- 
ture, which is interrupted in the complex crystal by the engaging receptor. 
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the Fc-fused S protein or its subdomain proteins involved in CD26 binding. 
Mock-transfected baby hamster kidney (BHK) cells or BHK cells transfected 
with CD26-expressing plasmid (BHK-CD26) were tested with the individual 
Fc-fusion proteins or an anti-CD26 antibody (anti-CD26 IgG). For each test, 
the secondary antibodies (anti-goat IgG or anti-mouse IgG) were used as the 
negative control. The profiles are shown. From left to right: BHK cells with the 
indicated Fc-fusion proteins or antibodies, BHK-CD26 with anti-CD26 
antibody, BHK-CD26 with Fc-fused $1, BHK-CD26 with Fc-fused NT'D, BHK- 
CD26 with Fc-fused RBD. c, A surface plasmon resonance assay characterizing the 
specific binding between CD26 and MERS-CoV RBD. The profiles are shown. 
Left, human ACE2 to SARS-CoV RBD; middle, CD26 to MERS-CoV RBD; top 
right, CD26 to SARS-CoV RBD; bottom right, human ACE2 to MERS-CoV RBD. 


CD26 is a type II transmembrane protein. It is present as a homo- 
dimer on the cell surface”. The dimerization of the peptidase relies 
on broad intermolecule contacts contributed by the hydrolase domain 
and the extended strands in blade IV of the B-propeller**". A lateral 
binding of MERS-CoV RBD to CD26 would therefore not disrupt 
CD26 dimerization. Accordingly, a similar U-shaped CD26 dimer 
could be generated by symmetry operations of the complex structure. 
The viral ligand locates at the membrane-distal tip of the dimer, cor- 
responding well to a trans interaction between the virus and the recep- 
tor (Fig. 3b). Considering that the RBD N and C termini are on the 
same side distant from CD26, it is unlikely that the remaining S 
domains would contact the receptor molecule. The binding mode 
revealed by the complex structure is also in good accordance with a 
previous study showing that the virus—receptor interaction is inde- 
pendent of the peptidase activity of CD26 (ref. 4). The bound RBD 
is far away from interfering with either the substrate/product accessing 
tunnels or the catalytic centre*®*" (Fig. 3b). 

Overall, a surface area of 1203.4 and 1113.4 A? in CD26 and MERS- 
CoV RBD, respectively, is buried by complex formation (Fig. 4a). 
Scrutinization of the binding interface reveals a group of hydrophilic 
residues at the site, forming a polar-contact (H-bond and salt-bridge) 
network. These interactions are predominantly mediated by the res- 
idue side chains (including RBD Y499 with CD26 R336, N501 with 
Q286, K502 with T288, D510 with R317, E513 with Q344, and D539 
with K267), although CD26 L294 and RBD D510 are observed to 
contact RBD R542 and CD26 Y322, respectively, through the main- 
chain oxygen atom (Fig. 4b). In addition, the bulged helix in CD26 
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Figure 2 | The overall structure of MERS-CoV RBD. a, A cartoon 
representation of the RBD structure. The secondary structural elements are 
labelled according to their occurrence in sequence. The disulphide bonds 
(marked with Arabic numbers 1-4) and N-glycan linked to N410 are shown as 
orange and green sticks, respectively. Core subdomain, magenta; external 
subdomain, cyan. The N and C termini are labelled. b, An amino acid sequence 
alignment between MERS-CoV and SARS-CoV RBDs. The hollow boxes and 


properly positions three hydrophobic residues A291, L294 and 1295 
into close proximity with the RBD amino acids Y540, W553 and V555, 
forming a hydrophobic centre at the interface (Fig. 4c). Further virus— 
receptor contacts include V341 and 1346 of CD26 packing against 
P515 and the apolar carbon atoms of R511 and E513 in RBD 
(Fig. 4d), and a CD26 N229-linked carbohydrate moiety interacting 
with RBD amino acids W535 and E536 (Fig. 4e). Overall, the virus— 
receptor engagement is dominated by the polar contacts mediated by 
the hydrophilic residues, and mutations of those in RBD (six alanine 
substitutions and one Y499F mutation of the CD26-interacting amino 
acids) completely abrogated its interaction with CD26 (Supplementary 
Fig. 4). The features of these residue interactions are very similar to 
those mediating the interaction between adenosine deaminase (ADA) 
and CD26 (ref. 23). By a pairwise comparison, we unexpectedly found 


VY 


arrows indicate «/3,9 helices and B-strands, respectively, and are coloured as in 
a. To facilitate comparison, the secondary-structure elements of SARS-CoV 
RBD (PDB code, 2DD8) are marked with spiral (helices) and arrow (strands) 
lines below the sequence. The cysteine residues that form disulphide bonds are 
labelled as in a, and residue N410 with a star. c, A structural alignment between 
MERS-CoV (magenta for core and cyan for external subdomains) and SARS- 
CoV (green) RBDs. 


that all those CD26 residues identified in the virus—receptor interface 
are also involved in ADA binding, indicating a competition between 
ADA and the virus for CD26 receptor. As the ADA-CD26 interaction 
is shown to induce co-stimulatory signals in T cells”, this may indicate 
a possible manipulation of the host immune system by MERS-CoV 
through competition for the ADA-recognition site. It is also note- 
worthy that those CD26 residues involved in RBD binding are highly 
conserved between human and bat, with only two variations (1295T 
and R317Q), explaining the capability of MERS-CoV using bat CD26 
for cell entry* (Supplementary Fig. 5). 

Coronaviruses can be categorized into three main genera or groups 
(group 1 (alpha), group 2 (beta) and group 3 (gamma) coronaviruses). 
Both MERS-CoV and SARS-CoV belong to the betacoronavirus genus, 
but are classified into different lineage subgroups (subgroup 2b for 


Figure 3 | The complex structure of MERS-CoV RBD bound to CD26. a, A 
cartoon representation of the complex structure. For clarity, only the 
B-propeller domain of CD26 (grey) is shown. Blades IV, V and the intervening 
IV/V linker that recognize RBD are highlighted in green, blue and red, 
respectively. The core subdomain and external RBM are coloured magenta and 
cyan, respectively. The right panel is yielded by clockwise rotation of the left 
panel along a longitudinal axis in the page-face. b, A symmetry-related CD26 


dimer observed in the complex crystal. The two-fold axis is shown as an upright 
arrow. The transmembrane topology of CD26 is indicated with a modelled 
lipid-bilayer membrane. In CD26, the propeller and side openings indicated as 
the substrate entrance/exit tunnels are marked with arrows, and the catalytic 
triad residues are highlighted as spheres. Colour selections are the same as in 
a, and the CD26 «/B hydrolase domain is shown in orange. The N and C 
termini are labelled. 
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Figure 4 | The atomic interaction details at the binding interface. a, An 
overview of the binding interface. CD26 and RBD are shown in surface and 
cartoon representations, respectively, and are coloured as in Fig. 3. The 
carbohydrate moiety linked to CD26 N229 is shown as green sticks. The 
contacting sites (each allocated with an Arabic number 1-4) are further 


SARS-CoV and 2c for MERS-CoV)*. We noted that the spike sequences 
are of low identity among different subgroup members. For example, 
MERS-CoV and SARS-CoV S proteins show a sequence identity of less 
than 28%. Nevertheless, RBDs of the two coronaviruses are homolog- 
ous for the core subdomain. Notably, the three interior disulphide 
bonds in the core are well-aligned for the steric positions in the two 
RBD structures and well-conserved in sequence among betacorona- 
viruses. Conversely, the external RBM region is highly variable in both 
length and residue composition (Supplementary Fig. 6). Consistently, 
no structural homology in this subdomain is observed between MERS- 
CoV and SARS-CoV. Yet it is this subdomain that engages cellular 
receptors. We therefore assume that betacoronaviruses probably have 
a similar core-domain fold in the S protein to present the external 
amino acids with divergent structures for viral pathogenesis, such as 
receptor recognition. 

Our work presents the fifth structure of virus S protein—-receptor 
complexes in the Coronaviridae family'*'*”*. Taking into account both 
the RBD structure and the binding mode with receptors, MERS-CoV is 
related to SARS-CoV” (a single insertion functioning as RBM) but 
differs from porcine respiratory coronavirus~ and NL63 (ref. 18) of 
alphacoronaviruses (multiple discontinuous RBMs) (Supplementary 
Fig. 7). Nevertheless, related structural topologies can still be observed 
in RBDs of these coronaviruses”*. We noted that in the RBD-receptor 
complex structures of both MERS-CoV and porcine respiratory cor- 
onavirus the binding interfaces involve a receptor N-glycan. This 
might represent another cross-genus similarity in the Coronaviridae 
family, which supports a proposed common evolutionary origin of 
coronavirus S proteins”. It would therefore be interesting to investigate 
the contribution of the sugar moiety to the virus-receptor interaction 
for MERS-CoV in the future. 

Vaccination remains the most useful measure to combat viral infec- 
tion and transmission. A large number of antibodies show neutralization 
activity by targeting the RBD and thereby disrupting the virus—receptor 
engagement. Therefore, a properly folded RBD could be an ideal immu- 
nogen for vaccination, as demonstrated for SARS-CoV”. A recent report 
indeed shows the presence of S-specific neutralizing antibodies in 
MERS-CoV- infected patients”*. It may be worth attempting to test the 
immunization effect of MERS-CoV RBD in the future. 


METHODS SUMMARY 

Protein expression, purification, crystallization and structure determination. 
Both His-tagged CD26 and MERS-CoV RBD proteins were expressed in insect 
High Five cells using the Bac-to-Bac baculovirus expression system (Invitrogen). 
The recombinant proteins were then purified via nickel-chelated affinity chro- 
matography and gel filtration. Crystals were obtained by initial screening with the 
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delineated in b-e for the amino acid interaction details. b, A strong polar- 
contact (H-bond and salt-bridge) network. c, d, The small patches of 
hydrophobic interactions. e, Contribution of the carbohydrate moiety. The 
residues involved are shown and labelled. NAG, N-acetyl-D-glucosamine; 
BMA, beta-D-mannose. 


commercially available kits followed by optimization. The RBD structure was 
solved by single-wavelength anomalous diffraction and the complex structure 
by molecular replacement. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 


Protein expression and purification. The proteins used for crystallization and 
surface plasmon resonance experiments were prepared with the Bac-to-Bac bacu- 
lovirus expression system (Invitrogen). The coding sequences for MERS-CoV 
RBD (GenBank accession number JX869059, spike residues 367-606), SARS- 
CoV RBD (accession number NC_004718, spike residues 306-527), human 
CD26 (accession number NP_001926, residues 39-766) and human ACE2 (acces- 
sion number BAJ21180, residues 19-615) were individually cloned into the 
pFastBacl vector. For each construct, a previously described gp67 signal peptide 
sequence”? was added to the protein N terminus for protein secretion, and a hexa- 
His tag was added to the C terminus to facilitate further purification processes. 
Transfection and virus amplification were conducted with Sf9 cells, and the 
recombinant proteins were produced in High Five cells. The cell culture was 
collected 48h after infection and passed through a 5-ml HisTrap HP column 
(GE Healthcare). After removal of most of the impurities, the recovered proteins 
were then pooled and further purified on a Superdex 200 column (GE Healthcare). 
Finally, each collected protein was prepared in a buffer consisting of 20 mM Tris- 
HCl (pH 8.0) and 150 mM NaCl and concentrated to about 10 mg ml ~ for further 
use. 

To obtain the complex of MERS-CoV RBD bound to CD26, the individual 
proteins were in vitro mixed at a molar ratio of 1:1 and incubated at 4 °C for about 
2h. The complex was then further purified on a Superdex 200 column, and con- 
centrated to about 15mg ml! for crystallization experiments. 

To prepare the Fc chimaeric proteins, the fragment encoding MERS-CoV S1 
(residues 1-751) or NTD (residues 1-353) or RBD (adding the S residues 1-17 of 
the signal peptide to its N terminus to facilitate protein secretion) was fused 5’- 
terminally to a fragment coding for the Fc domain of mouse IgG and ligated into 
the pCAGGS expression vector. A mutant RBD-Fc protein-expressing plasmid 
was also constructed by site-directed mutagenesis, for which the identified hydro- 
philic residues involved in CD26 binding were mutated simultaneously (Y499F; 
NS501A, K502A, D510A, E513A, D539A and R542A). The expression plasmids 
were then transfected into HEK293T cells. The cell culture was collected 48 h after 
transfection and directly used in the flow cytometric assay. 

Analytical gel filtration. MERS-CoV RBD, CD26 and their protein complex were 
individually prepared and adjusted to the same volume. The samples were then 
loaded onto a calibrated Superdex 200 column (GE Healthcare). The chromato- 
graphs were recorded and overlaid onto each other. The pooled proteins were 
analysed on a 12% SDS-PAGE gel and stained with Coomassie blue. 

Surface plasmon resonance assay. The BiAcore experiments were carried out at 
room temperature (25 °C) using a BIAcore 3000 machine with CM5 chips (GE 
Healthcare). For all the measurements, an HBS-EP buffer consisting of 10 mM 
HEPES, pH 7.5, 150 mM NaCl, 3 mM EDTA and 0.005% (v/v) Tween-20 was used, 
and all proteins were exchanged to the same buffer in advance via gel filtration. The 
MERS-CoV RBD and SARS-CoV RBD proteins were immobilized on the chip at 
about 500 response units. Gradient concentrations of human CD26 (0, 5, 10, 20, 40, 
80, 160, 320, 640 and 1,280nM) or human ACE2 (0, 10, 20, 40, 80, 160, 320, 640 
and 1,280 nM) were then used to flow over the chip surface. After each cycle, the 
sensor surface was regenerated via a short treatment using 10 mM NaOH. The 
binding kinetics were analysed with the software BIAevaluation Version 4.1 using 
the 1:1 Langmuir binding model. 

Flow cytometric assay. For the surface expression of CD26, the full-length coding 
sequence was cloned into the pEGFP-C1 vector which yields a plasmid encoding a 
recombinant CD26 protein with an EGFP-tag fused to its N terminus. The plasmid 
was transfected into the CD26-negative BHK cells using lipo2000 (Invitrogen) 
according to the manufacturer’s instructions. The cells were collected 48h after 
transfection. 

For staining, the mock-transfected BHK cells or the cells transfected with the 
CD26-expressing plasmid were suspended in PBS and incubated with the indi- 
vidual Fc-fusion protein culture or goat anti-CD26 IgG (R&D Systems) at room 
temperature for 1h. The cells were then washed and further incubated at room 
temperature for about 0.5 h with anti-mouse or anti-goat secondary IgG antibodies 
(R&D Systems). After washing, the cells were analysed by flow cytometry with a 


BD FACSCalibur machine. The cells incubated only with the secondary antibodies 
were used as the negative controls. 

Crystallization. All the crystals were obtained by vapour-diffusion sitting-drop 
method with 1 pil protein mixing with 1 jl reservoir solution and then equilibrat- 
ing against 100 jl reservoir solution at 18 °C. The initial crystallization screenings 
were carried out using the commercially available kits. The conditions that yield 
crystals were then optimized. Diffractable crystals of the free RBD protein were 
finally obtained in a condition consisting of 0.1M ammonium tartrate dibasic, 
pH7.0, and 12% PEG 3,350 with a protein concentration of 10 mg ml '. Deriva- 
tive crystals were obtained by soaking RBD crystals for 24h in mother liquor 
containing 2mM KAuCl,e2H,O. The complex crystals were grown in 6% (v/v) 
2-propanol, 0.1 M sodium acetate pH 4.5 and 26% PEG550 with a protein con- 
centration of 15mgml '. 

Data collection, integration and structure determination. For data collection, 
all crystals were flash-cooled in liquid nitrogen after a brief soaking in reservoir 
solution with the addition of 20% (v/v) glycerol. The native RBD data set was 
collected at the High Energy Accelerator Research Organization (KEK) BL1A 
(wavelength, 1.03818 A), whereas the diffraction data for the Au derivative crystal 
(wavelength, 1.0382 A) and the complex crystal (wavelength, 0.97930 A) were 
collected at the Shanghai Synchrotron Radiation Facility (SSRF) BL17U. All data 
were processed with HKL2000 (ref. 30). Additional processing was performed 
with programs from the CCP4 suite”’. 

The structure of RBD was determined by the single-wavelength anomalous 
diffraction (SAD) method. The Au sites were first located by SHELXD” for the 
Au-SAD data. The identified position were then refined and the phases were 
calculated with SAD experimental phasing module of Phaser’’. The real space 
constraints were further applied to the electron density map in DM™. The initial 
model was built with Autobuild in Phenix package**. Additional missing residues 
were added manually in Coot’®. The final model was refined with phenix.refine in 
the Phenix** with energy minimization, isotropic ADP refinement, and bulk solv- 
ent modelling. The complex structure was solved by molecular replacement mod- 
ule of Phaser*’, with the solved RBD structure and previously reported CD26 
structure (PDB code, 2BGR) as the search models. The atomic model was com- 
pleted with Coot” and refined with phenix.refine*’. The stereochemical qualities 
of the final models were assessed with PROCHECK*’”. The Ramachandran plot 
distributions for the residues in the free RBD structure were 86.8, 11.8 and 1.4% 
for the most favoured, additionally and generously allowed regions, respectively. 
These values were 86.5, 13.1 and 0.5% for the RBD-CD26 complex structure. Data 
collection and refinement statistics are summarized in Supplementary Table 1. All 
structural figures were generated using Pymol (http://www.pymol.org). 
Secondary-structure determination. The secondary structure determination was 
based on the ESPript”* algorithm. 
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Manipulation of the gut microbiota holds great promise for the 
treatment of inflammatory and allergic diseases'*. Although numer- 
ous probiotic microorganisms have been identified’, there remains 
a compelling need to discover organisms that elicit more robust thera- 
peutic responses, are compatible with the host, and can affect a specific 
arm of the host immune system in a well-controlled, physiological 
manner. Here we use a rational approach to isolate CD4*FOXP3* 
regulatory T (T,.g)-cell-inducing bacterial strains from the human 
indigenous microbiota. Starting with a healthy human faecal sample, 
a sequence of selection steps was applied to obtain mice colonized with 
human microbiota enriched in T,,.,-cell-inducing species. From these 
mice, we isolated and selected 17 strains of bacteria on the basis of 
their high potency in enhancing T,,., cell abundance and inducing 
important anti-inflammatory molecules—including interleukin-10 
(IL-10) and inducible T-cell co-stimulator (ICOS)—in T,eg cells upon 
inoculation into germ-free mice. Genome sequencing revealed that 
the 17 strains fall within clusters IV, XIVa and XVIII of Clostridia, 
which lack prominent toxins and virulence factors. The 17 strains 
act as a community to provide bacterial antigens and a TGF-B-rich 
environment to help expansion and differentiation of T,.g cells. 
Oral administration of the combination of 17 strains to adult mice 
attenuated disease in models of colitis and allergic diarrhoea. Use of 
the isolated strains may allow for tailored therapeutic manipulation 
of human immune disorders. 

CD4* FOXP3~ Treg cells are present most abundantly in the intest- 
inal mucosa at steady state, and contribute to intestinal and systemic 
immune homeostasis*”. In germ-free mice, the frequency of colonic Teg 
cells and levels of IL-10 expression by Tg cells are markedly reduced*”. 
We have shown previously that a combination of Clostridia strains 
isolated from conventionally reared mice potently affect the number 
and function of CD4* FOXP3~ Treg cells in mouse colonic lamina 
propria*. In an attempt to enable clinical translation of our previous 
findings, we aimed to identify T,.g-cell-inducing bacterial strains 
derived from the human microbiota (see Supplementary Fig. 1 for a 
summary of the procedure). 

We obtained a human stool sample from a healthy Japanese volun- 
teer. Because we previously reported that the chloroform-resistant 
fraction of mouse gut microbiota was enriched in T;eg-cell-inducing 
species’, the stool sample was either untreated or treated with chlo- 
roform and orally inoculated into IQI/Jic germ-free mice. Each group 
of ex-germ-free (exGF) mice was separately housed for 3-4 weeks in 
vinyl isolators to avoid further microbial contamination. Although a 
recent study showed that the human microbiota had no impact on the 


immune responses in the mouse small intestine’, we observed a signifi- 
cant increase in the percentage of FOXP3~ Treg cells among cpa” 
T cells in the colons of exGF mice inoculated with untreated human 
faeces compared with germ-free mice (Fig. 1a and Supplementary Fig. 2). 
Notably, a more pronounced increase was observed in the colons 
of exGF mice inoculated with chloroform-treated human faeces 
(Fig. 1a). These findings suggest that the human intestinal microbiota 
contains T,e,-cell-inducing bacteria, and that they are enriched in 
the chloroform-resistant fraction. We also examined the effects of human 
faeces inoculation on colonic IL-17- and IFN-y-expressing CD4* 
cells (T},17 and Ty1 cells). In exGF mice inoculated with untreated or 
chloroform-treated human faeces, the frequency of Ty1 cells was 
unchanged compared with germ-free mice (Fig. 1b). By contrast, there 
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Figure 1 | T,,, cell accumulation in germ-free mice induced by inoculation 
with human microbiota. a-e, The percentages of FOXP3*,IL-17° and IEN-y~ 
cells within the CD4* cell population in the colon lamina propria of the indicated 
mice are shown (see also Supplementary Fig. 2). Circles represent individual 
animals. The height of the black bars indicates the mean. All experiments were 
performed more than twice with similar results. Error bars indicate s.d. 

**P < 0.01; NS, not significant. +hu, exGF mice inoculated with untreated 
human faeces; +huChlo, exGF mice inoculated with chloroform-treated human 
faeces. (See the main text for further definitions of x-axis labels.) 
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was a significant accumulation of T}17 cells in the colons of exGF mice 
inoculated with untreated human faeces (Fig. 1c and Supplementary 
Fig. 2). Notably, the capacity of human faeces to induce T}17 cells was 
greatly diminished after treatment with chloroform (Fig. 1c). These 
results indicate that the chloroform-sensitive bacterial fraction in the 
human stool tested contained T},;17-cell-inducing bacteria, whereas 
the chloroform-resistant bacteria preferentially promoted Tyeg cell 
accumulation in the colon. 

To investigate whether T,., cell induction by the chloroform-resistant 
fraction of human intestinal bacteria is transmissible, adult germ-free 
mice were co-housed with exGF mice inoculated with chloroform- 
treated human faeces for 4 weeks. Co-housed mice showed a signifi- 
cant increase in the frequency of colonic T¢g cells (Fig. 1d). In addition, 
the progeny of exGF mice inoculated with chloroform-treated human 
faeces also showed increased numbers of Treg cells (Fig. 1d). Therefore, 
Tyeg Cell induction by human intestinal bacteria is horizontally and 
vertically transmissible. Oral inoculation of germ-free mice with 
2 X 10*-fold diluted caecal samples from exGF mice inoculated with 
chloroform-treated human faeces fully induced the accumulation of 
Treg Cells in the colon lamina propria, suggesting that abundant rather 
than minor members of the intestinal microbiota in exGF mice inocu- 
lated with chloroform-treated human faeces drive the observed induc- 
tion of Treg cells (Fig. le). The T,eg-cell-inducing microbiota in mice 
inoculated with the 2 X 10*-fold diluted sample (+2 x 10* mice) was a 
stable community, because serial oral inoculation of caecal contents 
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Figure 2 | Assessment of microbiota composition and isolation of T,-g-cell- 
inducing strains. a, b, Pyrosequencing of 16S rRNA genes was performed on 
caecal contents from the indicated mice. Relative abundance of OTUs (%) in 
the caecal bacterial community in each mouse (a), and the closest species/strain 
in the database and the corresponding isolated strain number for the indicated 
OTU (b) are shown. c, SEM showing the proximal colon of +23-mix mice. 
Original magnification, ~ 20,000. d, e, The percentages of FOXP3° cells 
within the CD4* cell population (d) and Helios” cells in CD4*FOXP3* cells 
(e) in the colon of the indicated mice. Circles represent individual animals. All 
experiments were performed more than twice with similar results. Error bars 
indicate s.d. **P < 0.01. 
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from these mice equally induced the accumulation of Tyeg cells in 
secondary (+2 10*-re mice) and tertiary recipients (+2 10*-re- 
re mice) (Fig. le). To minimize nonessential components of the micro- 
biota for T, eg cell induction, the caecal contents of +2 10* mice were 
again diluted 2 x 10*-fold and orally inoculated into another set of 
germ-free mice (+(2 X 10*)* mice). The +(2 X 10*)? mice had a 
marked accumulation of Tyeg cells in the colon (Fig. le). These results 
suggested that we succeeded in obtaining mice colonized with a rela- 
tively restricted and stable community of bacterial species enriched for 
Treg cell inducers. 

The composition of the gut microbiota in mice treated with human 
samples was analysed by 16S ribosomal RNA (rRNA) gene amplicon 
sequencing using a 454 sequencer. Quality filter-passed sequences 
(3,000 reads for each sample) were classified into operational taxo- 
nomic units (OTUs) based on sequence similarity (>96% identity). 
The numbers of detected reads and closest known species for each 
OTU are shown in Supplementary Table 1, and the relative abundance 
of OTUs in each caecal sample is shown in Fig. 2a. As expected, the 
OTU profiles of mice treated with human faeces were quite different 
from those of conventional specific pathogen-free (SPF) mice (Sup- 
plementary Fig. 3). In mice inoculated with untreated human faeces, 
OTUs belonging to Bacteroidetes accounted for about 50% of the 
caecal microbial community (Fig. 2a). By contrast, most OTUs in 
exGF mice inoculated with chloroform-treated human faeces were 
related to Clostridia species. Most bacteria in +2 X 10*, +2 X 10*-re 
and +(2 X 10*)? mice had 16S rRNA gene sequence similarities with 
about 20 species of Clostridia, listed in Fig. 2b. 

To isolate bacterial strains with T,,..-cell-inducing capabilities, we 
cultured caecal contents from +2 X 10*, +2 X 10*-reand +(2 X 10%)? 
mice in vitro and picked 442 colonies. BLAST searches of 16S rRNA 
gene sequences of the isolated colonies revealed that 31 strains in total 
were present, all of which were Clostridia (Supplementary Fig. 4). Of 
the 31 strains, we selected 23 that had less than 99% 16S rRNA gene 
sequence identity to any of the other 30 strains (Supplementary Fig. 4). 
We then individually cultured the 23 strains, mixed them in equal 
amounts, and orally inoculated the mixture into germ-free IQI mice 
(+23-mix mice). Numerous rod- and round-shaped bacteria were 
observed by scanning electron microscopy (SEM) on the epithelial cell 
surface in +23-mix mice (Fig. 2c), and the size and appearance of the 
caeca were quite different from those in germ-free mice, indicating 
successful colonization (Supplementary Fig. 5a). Pyrosequencing of 
16S rRNA genes revealed that the caecal microbiota composition in 
+23-mix mice was quite similar to that in +(2 X 10°)? mice (Fig. 2a). 
In +23-mix mice, we observed efficient induction of T, eg cells in the 
colonic lamina propria (Fig. 2d). The magnitude was comparable to 
that observed in exGF mice inoculated with chloroform-treated 
human faeces and much higher than that in mice colonized with 
Faecalibacterium prausnitzii, a human Clostridia strain well known 
for enhancing regulatory cell functions’ (Fig. 2d). Most T,eg cells in 
+23-mix mice expressed low levels of Helios (also known as IKZF2), 
indicating antigen-experienced cells (Fig. 2e, Supplementary Fig. 5b 
and ref. 10). 

Only 17 strains listed in Fig. 2b and Supplementary Fig. 4 were 
detected in +23-mix mice by 16S rRNA gene sequencing, indicating 
that these 17 strains may be sufficient to induce T,.g cells. Indeed, we 
found that the mixture of 17 strains (17-mix) induced FOXP3* Treg 
cells to a similar extent as the 23-mix (Fig. 3a). The increase in Teg cells 
induced by the 17-mix was reproducibly observed in exGF mice of 
different genetic backgrounds (IQI, BALB/c and C57BL/6) (Fig. 3a). 
Moreover, the mix was effective in other rodents: the frequency of colonic 
Treg cells in exGF rats inoculated with 17-mix was significantly higher 
than that in germ-free rats and comparable to that in SPF rats (Fig. 3a). 
The colonization with 17-mix induced a significant increase in the fre- 
quency of IL-10* and/or ICOS* cells within the Tyeg cell population, as 
revealed by analysis of exGF IL-10 reporter mice (I110Y°""* mice, ref. 4) 
colonized with the 17-mix (Fig. 3b). Furthermore, IL-10° Treg Cells in 
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Figure 3 | Characterization of 17 T,-g-cell-inducing strains. a, The 
percentages of FOXP3* cells within the CD4* cell population in the colon 
lamina propria of the indicated mice and rats. b, The expression of Venus 
(IL-10) and FOXP3 by the gated colonic lamina propria CD4"* cells, and ICOS 
expression by CD4* FOXP3* cells in exGF I110’°""* mice colonized with or 
without 17-mix. c, Percentages of FOXP3* cells within the CD4* cell population 
in IQI exGF mice colonized with the indicated mix. d, The production of TNF-« 
and TGF-B1 in HCTS cells stimulated with caecal extracts from the indicated 
mice. e, CD8™ T cells from OT-I mice (Tq) and the indicated ratio of colon lamina 
propria CD4* CD25" cells from +17-mix mice (Treg) were incubated with 
CD11c* cells pulsed with OT-I peptide alone or in combination with autoclaved 
caecal contents from +17-mix mice (+17-mix caecal), germ-free mice (+GF 
caecal), or autoclaved 17 strains cultured in vitro (+17 st in vitro). Depicted data 
represent average of duplicates (see also Supplementary Fig. 9c). Circles in 

a, c-e represent samples from individual animals. All experiments were performed 
more than twice with similar results. Error bars indicate s.d. **P < 0.01; *P < 0.05; 
NS, not significant. 


+17-mix mice expressed high levels of CTLA4 (Supplementary Fig. 5c). 
Because IL-10 and CTLA4 are essential for the immunosuppressive 
activity of Treg cells''’’, and ICOS is required for the T,e,-cell-mediated 
suppression of T};2 responses”’, these results suggest that the mixture of 
17 strains affects both the number and function of Tyg cells in the colon. 
Next, we monocolonized germ-free mice with one of each of the 17 
individual strains to determine their individual T,.. cell induction cap- 
ability. The monocolonized exGF mice exhibited low to intermediate 
levels of Tyeg cell induction with inter-individual variability (Supplemen- 
tary Fig. 6a). As expected, none of the strains induced T}417 cells in the 
monocolonized mice (Supplementary Fig. 6b). We also examined T,2g 
cell induction by subsets of the 17-mix (randomly selected combinations 
of 3-5 strains: 3-mix, 5-mix-A, 5-mix-B, and 5-mix-C, see Supplemen- 
tary Fig. 4). Although all tested combinations of 5-mix induced increases 
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in the frequency of T;<g cells, the magnitude was substantially lower than 
that observed in +17-mix mice (Fig. 3c). Therefore, it is likely that the 
17 strains act synergistically to amplify the induction of Treg cells in a 
microbial-community-dependent fashion. 

To investigate the mechanism for the T,eg cell induction by the 
community of 17 strains, we incubated various human and mouse 
intestinal epithelial cell lines and primary cells with aqueous extracts 
from caecal contents from the +17-mix mice, and assessed the pro- 
duction of the active form of TGF-f1, a key cytokine for the differenti- 
ation and expansion of T;eg cells. The caecal extracts from +17-mix 
mice routinely elicited TGF-B1, but not IL-6 and TNF-a, production, 
and the magnitude was significantly higher than that elicited by caecal 
extracts from single-strain or 5-mix-colonized mice (Fig. 3d and 
Supplementary Fig. 7). The induction of TGF-B1 was not inhibited 
by pre-treatment of the caecal extracts with a protease or nuclease 
(Supplementary Fig. 7c). Short-chain fatty acids (SCFAs) are protease- 
and nuclease-insensitive and have been associated with regulation of 
host immune homeostasis'*. Quantitative analysis of SCFAs in caecal 
contents from + 17-mix mice showed a significantly higher concentra- 
tion of acetate, propionate, butyrate and isobutyrate than that in single- 
strain or 5-mix-colonized mice (Supplementary Fig. 8a). Furthermore, 
a mixture of sodium salts of these SCFAs elicited TGF-B1 production in 
epithelial cells to a level similar to that seen when the cells were stimu- 
lated with caecal extracts from +17-mix mice (Supplementary Fig. 8b). 
Therefore, the community of 17 strains cooperatively produces SCFAs 
that can elicit a TGF-f response, and this activity may contribute to 
the differentiation and expansion of Tyeg cells. We also investigated 
whether the 17 strains provide bacterial antigens to T cells. To do this, 
we addressed the antigen specificity of Te, cells accumulated in + 17-mix 
mice using a cognate antigen-driven suppression assay. CD4*CD25* 
lamina propria T cells from +17-mix mice substantially inhibited 
the OT-I ovalbumin (OVA) peptide-driven proliferation of OT-I CD8 
T cells, and this suppression was markedly enhanced in the presence of 
autoclaved caecal content from +17-mix mice or autoclaved 17 strains 
cultured in vitro, but not in the presence of OT-II OVA peptide or caecal 
content from germ-free mice (Fig. 3e and Supplementary Fig. 9). These 
results are consistent with previous reports’*"° and suggest that some 
fraction of colonic lamina propria Teg cells in +17-mix mice is specific 
to the 17 strains of Clostridia. Next, we assessed the kinetics of Teg cell 
accumulation and their expression of Ki67, a cell-cycle-associated nuclear 
protein, and gut-homing-associated molecules CD103 and {7 integrin. 
We observed a marked increase in the proportion of Ki67,CD103 and 87 
expressing cells by 1 week after inoculation with the 17-mix (Supplemen- 
tary Figs 10 and 11). Collectively, these observations indicate that the 17 
strains provide SCFAs, bacterial antigens and probably other factors, 
which together contribute to differentiation, expansion and colonic hom- 
ing of T,.g cells. 

To define the identity of the 17 bacterial strains fully, we sequenced 
their genomes (Supplementary Fig. 12). Phylogenetic comparison of 
the 17 strains using ribosomal multi locus sequencing typing (rMLST) 
revealed that the 17 strains belong to bacterial species falling within 
clusters XIVa, IV and XVIII of Clostridia as defined previously” (in a 
recent taxonomy, members of cluster XVIII Clostridia were reclassi- 
fied in the class Erysipelotrichi) (Supplementary Fig. 13). The genome 
sequencing also revealed that the 17 strains all lack strong virulence- 
related genes such as collagenase and phospholipase C, often identified 
in pathogenic Clostridia species (Supplementary Table 2). We then 
examined the relative abundance of the 17 strains in healthy and 
ulcerative colitis human subjects using draft genome sequences of 
the 17 strains and publicly available human microbiome genomes 
generated through the MetaHIT project’®. Ulcerative colitis subjects 
showed a tendency towards a reduction of the 17 strains, and 5 out of 
the 17 strains were significantly reduced in ulcerative colitis patients 
(Supplementary Fig. 14). 

To evaluate the potential benefits of supplementation with the 17 
strains, 17-mix or control PBS was orally administered into adult SPF 
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mice every 2 or 3 days (SPF + 17-mix or SPF + ctrl, respectively). We 
confirmed a significant increase in the frequency of colonic T,2g cells in 
SPF + 17-mix mice compared with SPF + ctrl mice after 3 weeks of 
treatment (Fig. 4a). While being continuously treated with 17-mix or 
control, mice were subjected to the OVA-induced allergic diarrhoea 
model”. The occurrence and severity of diarrhoea and the OVA-specific 
serum IgE levels were significantly reduced in SPF + 17-mix mice rela- 
tive to control mice (Fig. 4b-d). The protective effect of 17-mix was 
significantly attenuated by treatment of mice with a T,..-cell-depleting 
anti-CD25 antibody (Supplementary Fig. 15). We also subjected mice to 
an experimental colitis model induced by trinitrobenzene sulphonic acid 
(TNBS)*’. SPF + 17-mix mice showed less severe colon shortening and 
milder histological disease features, accompanied by lower mortality 
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Figure 4 | Treatment with 17-mix suppresses experimental colitis models. 
a, The percentages of FOXP3™ cells within the CD4* cell population in SPF + 
17-mix or SPF + ctrl mice. b-d, SPF + 17-mix (mn = 9) and SPF + ctrl (n = 7) 
mice were subjected to OVA-induced diarrhoea. The diarrhoea score (b; see 
Methods for definition), representative photographs of faeces (c), and OVA- 
specific IgE levels in the sera (d) are shown. sc, subcutaneous. e-g, SPF + 17- 
mix (m = 8) and SPF + ctrl (n = 7) were treated with TNBS. Animal survival 
(e), haematoxylin and eosin staining (original magnification, < 10) (f), and 
histology score of the distal colon (g) on day 4 after TNBS administration are 
shown. Data are representative of two independent experiments. Error bars 
indicate s.d. **P < 0.01; *P < 0.05. 
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than control mice (Fig. 4e-g and Supplementary Fig. 16a). In keeping 
with this clinical outcome, there was significantly increased expression 
of Foxp3 and Tgfb1 mRNA in SPF + 17-mix mice compared with con- 
trol mice, as well as a tendency towards a reduction of inflammatory 
cytokine transcripts (Supplementary Fig. 16b). Identical suppression of 
colitis by 17-mix was also observed in an adoptive transfer model, in 
which germ-free SCID mice were orally inoculated with faeces from 
SPF mice together with or without 17-mix and then transferred with 
CD4*CD45RB" T cells (Supplementary Fig. 17). 

The clinical track record of efficacy of single-strain probiotics has 
been modest. It has been postulated that a collection of functionally 
distinct bacterial species rationally selected from the human gut micro- 
biota may be more effective than single strains in preventing/treating 
disease”’. In the present study, we isolated 17 strains within Clostridia 
clusters XIVa, IV and XVIII from a human faecal sample; these strains 
affect Tyg cell differentiation, accumulation and function in the mouse 
colon. It remains to be seen whether the 17 strains will have similar 
effects in the human intestine; however, a decreased prevalence of 
Clostridia clusters XIVa and IV in faecal samples from patients with 
inflammatory bowel disease and atopy**** may suggest that supple- 
mentation with the 17-strain bacterial community might counter- 
balance dysbiosis, induce T,eg cells and aid in the management of 
allergic and inflammatory conditions. 


METHODS SUMMARY 


Experiments were performed with authorization from the Institutional Review 
Board for Human Research at RIKEN Yokohama Research Institute. Human 
stool from a healthy volunteer (Japanese, male, age 31 years) was obtained with 
informed consent. The sample was mixed with or without chloroform, and the 
aliquots were inoculated into germ-free IQI mice. Detailed procedures for lamina 
propria lymphocyte analysis, isolation of bacteria, extraction of bacterial DNA and 
sequencing are described in Methods. Statistical analysis was performed using the 
Student’s t-test. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 

Mice and rats. C57BL/6, BALB/c, IQI/Jic and CB.17 SCID mice and F344 rats kept 
under SPF or germ-free conditions were purchased from Sankyo laboratories, 
Japan SLC, or CLEA Japan. IQI germ-free mice were used unless otherwise indi- 
cated. Germ-free and gnotobiotic mice were bred and maintained in vinyl isolators 
within the gnotobiotic facility of Sankyo laboratories. Germ-free [110’°""* mice 
were generated as previously described*. OT-I and OT-II T-cell receptor trans- 
genic mice were purchased from Taconic Farms. All animal experiments were 
approved by the Animal Research Committee of RIKEN Yokohama Institute and 
the University of Tokyo. 

Chloroform treatment of human stool and generation of gnotobiotic mice. 
Human stool from a healthy volunteer (Japanese, male, age 31 years) was obtained 
with informed consent. Human stool and mouse caecal contents were directly 
frozen at —80 °C, or suspended in 4 times volume (w/v) of phosphate-buffered 
saline (PBS) + 20% glycerol solution, snap-frozen in liquid nitrogen and stored at 
—80 °C until use. The frozen stocks were thawed, suspended in 10 times volume 
(w/v) of PBS, and passed through a 70 1m cell strainer to eliminate clumps and 
debris. Then suspensions were mixed with chloroform (final concentration 3%), 
and incubated in a shaking water bath for 60 min. After evaporation of chloroform 
by bubbling with N, gas for 30 min, the aliquots containing the chloroform-resistant 
fraction of intestinal bacteria were inoculated into germ-free mice by intra-gastric 
administration (250 1]; per mouse). To generate a series of gnotobiotic mice inocu- 
lated with diluted samples, caecal contents from exGF mice were treated with 
chloroform, diluted with PBS, and inoculated into germ-free IQI mice. The caecal 
suspensions diluted 2 X 10*-fold correspond to 2.5 X 10* bacterial cells per mouse. 
Each group of exGF mice was individually caged in the gnotobiotic isolator for 
3-4 weeks at Sankyo Lab service. 

Isolation of intestinal lamina propria lymphocytes and flow cytometry. The 
colons were collected and opened longitudinally, washed with PBS to remove all 
luminal contents and shaken in Hanks’ balanced salt solution (HBSS) containing 
5 mM EDTA for 20 min at 37 °C. After removing epithelial cells, muscle layers and 
fat tissue using forceps, the lamina propria layers were cut into small pieces and 
incubated with RPMI1640 containing 4% fetal bovine serum, 0.5mgml * col- 
lagenase D, 0.5 mg ml ' dispase and 40 pg ml ' DNase I (all Roche Diagnostics) for 
1h at 37°C in a shaking water bath. The digested tissues were washed with HBSS 
containing 5 mM EDTA, resuspended in 5 ml of 40% Percoll (GE Healthcare) and 
overlaid on 2.5 ml of 80% Percoll in a 15-ml Falcon tube. Percoll gradient separation 
was performed by centrifugation at 800g for 20 min at 25°C. The lamina propria 
lymphocytes were collected from the interface of the Percoll gradient and sus- 
pended in ice-cold PBS. For analysis of T,¢g cells, isolated lymphocytes were labelled 
with the LIVE/DEAD fixable dead cell stain kit (Invitrogen) to exclude dead cells 
from the analysis. The cells were washed with staining buffer containing PBS, 2% 
FBS, 2mM EDTA and 0.09% NaN; and surface staining was performed with 
PECy7- or Pacific blue-labelled anti-CD4 antibody (RM4-5, BD Biosciences), 
PE-labelled anti-ICOS antibody (C938.4A, BioLegend), Alexa488-labelled anti- 
CD103 antibody (2E7, BioLegend), and PerCP/Cy5.5-labelled anti-integrin-}7 
antibody (FIB27, BioLegend). Intracellular staining of FOXP3, CTLA4, Helios 
and Ki67 was performed using the Alexa647-labelled anti-FOXP3 antibody (FJK- 
16 s, eBioscience), PE-labelled anti-CTLA4 antibody (UC10-4F10-11, BD Bio- 
sciences), PE-labelled anti-Helios antibody (22F6, BioLegend), PECy7-labelled 
anti-Ki67 antibody (B56, BD Biosciences) and FOXP3 staining buffer set (eBio- 
science). For analysis of Tjy1 and T,,17 cells, isolated lymphocytes were stimulated 
for 4h with 50 ng ml” ' phorbol 12-myristate 13-acetate (PMA, Sigma) and 1 pg ml" 
ionomycin (Sigma) in the presence of GolgiStop (BD Biosciences). After incuba- 
tion for 4h, cells were washed in PBS, labelled with the LIVE/DEAD fixable dead 
cell stain kit and surface CD4 was stained with PECy7-labelled anti-CD4 antibody. 
Cells were washed, fixed in Cytofix/Cytoperm, permeabilized with Perm/Wash 
buffer (BD Biosciences), and stained with the APC-labelled anti-IL-17 antibody 
(eBiol7B7, eBioscience) and FITC-labelled anti-IFN-y antibody (XMG1.2, BD 
Biosciences). The antibody-stained cells were analysed with LSR Fortessa or 
FACSAriallI (BD Biosciences), and data were analysed using FlowJo software 
(Treestar). 

Meta 16S rRNA gene sequencing. The caecal contents from exGF mice were 
suspended in 10 ml of Tris-EDTA containing 10 mM Tris-HCl and 1 mM EDTA 
(pH 8), and incubated with lysozyme (Sigma, 15 mg ml!) at 37°C for 1h with 
gentle mixing. A purified achromopeptidase (Wako) was added (final concentra- 
tion 2,000 unit ml!) and further incubated at 37 °C for another 30 min. Then, 
sodium dodecyl sulphate (final concentration 1%) was added to the cell suspension 
and mixed well. Subsequently, proteinase K (Merck) was added (final concentra- 
tion 1 mg ml’) to the suspension and the mixture was incubated at 55 °C for 1h. 
High-molecular-mass DNA was isolated and purified by phenol/chloroform 
extraction, ethanol and finally polyethyleneglycol precipitation”. PCR was per- 
formed using Ex Taq (TAKARA) and (1) the 454 primer A (5’-CCATCTCA 
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TCCCTGCGTGTCTCCGACTCAG (454 adaptor sequence) + barcode (10 
bases) + AGRGTTTGATYMTGGCTCAG-3’ (27Fmod)) and (2) the 454 primer 
B (5'-CCTATCCCCTGTGTGCCTTGGCAGTCTCAG (454 adaptor sequence) 
+ TGCTGCCTCCCGTAGGAGT-3’ (338R)) to the V1-V2 region of the 16S 
rRNA gene. Amplicons generated from each sample (~330 bp) were subsequently 
purified using AMPur XP (Beckman Coulter). The amount of DNA was quan- 
tified using Quant-iT Picogreen dsDNA assay kit (Invitrogen) and TBS-380mini 
fluorometer (Turner Biosystems). Then, the amplified DNA was used as template 
for 454 GS Junior (Roche) pyrosequencing using GS Junior Titanium emPCR Kit- 
Lib-L, GS Junior Titanium Sequencing Kit and GS Junior Titanium PicoTiterPlate 
Kit (all Roche) according to the manufacturer’s instructions. Quality-filter-passed 
reads were obtained by removing reads that did not have both primer sequences, 
had the average quality value (QV) <25, and were possibly chimaeric”®. Of the 
filter-passed reads, 3,000 reads trimming off both primer sequences for each sample 
were used and subjected to OTU analysis with the cutoff similarity of 96% identity. 
Representative sequences from each OTU were blasted to Ribosomal Database 
Project (RDP) of bacterial isolates, our genome database constructed from pub- 
lically available genome sequences in NCBI and HMP databases, and 16S sequences 
of the 23 strains obtained in this study. 

Isolation of bacterial strains. The frozen stocks of caecal contents from exGF 
mice were serially diluted with PBS and seeded onto non-selective agar plates 
(blood liver (BL) agar (Eiken Chemical) or Eggerth-Gagnon (EG) agar plates). 
EG agar plates contain the following components (quantities expressed per litre): 
Lab-Lemco Powder (2.8 g, Oxoid); proteose peptone no. 3 (10.0 g, Difco); yeast 
extract (5.0 g, Difco); NasHPO, (4.0 g); D(+)-glucose (1.5 g); soluble starch (0.5 g); 
L-cystine (0.2 g); L-cysteine- HCI-H,O (0.5 g); Tween 80 (0.5 g); Bacto agar (16.0 g, 
Difco); and defibrinated horse blood (50 ml). After culture under aerobic condi- 
tions or strictly anaerobic conditions (80% N>, 10% H3, 10% COs) at 37 °C for 2 or 
4 days, individual colonies were picked up and cultured for an additional 2 or 4 
days at 37°C in ABCM broth (Eiken Chemical) or EG agar plate. The isolated 
strains were collected into EG stock medium (10% glycerol) and stored at —80 °C. 
To identify the isolated strains, 16S rRNA gene sequences were determined. The 
16S rRNA gene was amplified by colony-PCR using KOD FX (TOYOBO) and 
GeneAmp PCR System9700 (Applied Biosystems) using 16S rRNA gene-specific 
primer pairs: 8F (5’-AGAGTTTGATCMTGGCTCAG-3’) and 519R (5’- 
ATTACCGCGGCKGCTG-3’) or 1513R (5'-ACGGCTACCTTGTTACGACTT- 
3’). The amplification program consisted of one cycle at 98 °C for 2 min, followed 
by 40 cycles at 98 °C for 10 s, 57 °C for 30s and 68 °C for 1 min 30 s. Each amplified 
DNA was purified from the reaction mixture using Illustra GFX PCR DNA and gel 
band purification kit (GE Healthcare). Sequence analysis was performed using 
BigDye Terminator V3.1 cycle sequencing kit (Applied Biosystems) and Applied 
Biosystems 3730xl DNA analyser (Applied Biosystems). The resulting sequences 
were compared with sequences in RDP database and genome database using 
BLAST to determine close species/strains. 

Bacterial culture of isolated strains. The isolated strains of Clostridia and 
Erysipelotrichi were cultured in EG broth without horse blood under a strictly 
anaerobic condition (80% N>, 10% H, 10% CO;) at 37 °C in an anaerobic cham- 
ber (Coy Laboratory Products). To prepare the bacterial mixture, bacterial strains 
were individually grown in EG broth to confluence and mixed at equal amounts of 
media volume. 

Scanning electron microscopy. Scanning electron microscopy was performed by 
Filgen, Inc., Japan. The proximal colon was removed from +23-mix mice, cut 
open longitudinally, prefixed with 2% glutaraldehyde in 0.1 M phosphate buffer 
(pH7.4) for 24h at 4°C, and then postfixed with 2% osmium tetroxide for 1h at 
4 °C. Fixed samples were dehydrated for 5 min each in sequential baths of 50%, 
70%, 90% and 100% ethanol, inserted into a critical point dryer until dry and 
coated with osmium in an OPC-80N osmium plasma coater (Filgen). Scanning 
electron micrographs were taken by a JEOL JSM-6320F instrument. 
Measurement of organic acids. Organic acid concentrations in caecal contents 
were determined by gas chromatography-mass spectrometry (GC-MS). Caecal 
contents (10 mg) were disrupted using 3-mm zirconia/silica beads (BioSpec Pro- 
ducts) and homogenized in extraction solution containing 100 kl of internal stand- 
ard (100 LM crotonic acid), 50 pl of HCl and 200 il of ether. After vigorous shaking 
using a Shakemaster neo (Bio Medical Science) at 1,500 r.p.m. for 10 min, homo- 
genates were centrifuged at 1,000g for 10 min and then the top ether layer was 
collected and transferred into new glass vials. Aliquots (80 jl) of the ether extracts 
were mixed with 16 11 of N-tert-butyldimethylsilyl-N-methyltrifluoroacetamide 
(MTBSTFA). The vials were sealed tightly by screwing and heated at 80°C for 
20 min in a water bath, and left at room temperature for 48h for derivatization. 
The derivatized samples were run through a 6890N Network GC System (Agilent 
Technologies) equipped with HP-5MS column (0.25 mm X 30m X 0.25 um) and 
5973 Network Mass Selective Detector (Agilent Technologies). Pure helium 


(99.9999%) was used as a carrier gas and delivered at a flow rate of 1.2 mlmin |. 
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The head pressure was set at 10 p.s.i. with split 10:1. The inlet and transfer line 
temperatures were 250 °C and 260 °C, respectively. The following temperature pro- 
gram was used: 60°C (3 min), 60-120 °C (5 °C min“ '), 120-300 °C (20°C min‘). 
One microlitre quantity of each sample was injected with a run time of 30 min. 
Organic acid concentrations were quantified by comparing their peak areas with the 
standards. 

Genome sequencing and gene prediction. The genome sequences of 17 Tyeg-cell- 
inducing strains were determined by the whole-genome shotgun strategy using a 
454GS FLX Ti or Ion PGM sequencer. Each 1-5 1g of the genomic DNA was 
sheared to obtain DNA fragments. Template DNA was prepared according to the 
supplier’s protocol. The generated sequence data were assembled using Newbler 
v2.8 software to obtain the draft genome sequences. All genome sequence data 
were deposited in DDBJ BioProject ID: PRJDB521-543. Protein-encoding genes 
were predicted using MetaGeneAnnotator software’’. Putative toxins and viru- 
lence factors were searched using the BLASTP program and virulence factor 
databases, VFDB (http://www.mgc.ac.cn/VFs/main.htm) and MvirDB (http:// 
mvirdb.IInl.gov), with the e-value cutoff of 1.0 x 10~?°, the identity >30% and 
the length coverage >60%. 

Phylogenetic tree. Sequences concatenated with genes encoding 26 ribosomal 
proteins (large subunit L10, L11, L14, L16, L17, L19, L20, L23, L24, L29, L31, 
L32, L35, L7/L12, and small subunit $10, $12, $13, S15, $16, S17, S20, $20, $21, 
S3, $4, S7, S8) predicted from the genomes of each strain were used to construct a 
phylogenetic tree. The sequences of other bacterial species used for the tree con- 
struction were obtained from the ribosomal multi-locus sequencing typing 
(MLST) database”*. The calculation was performed using the MEGA v5.0 package 
and the neighbour-joining method with a bootstrap of 1,000 replicates. 

Cognate antigen-driven T,-, cell suppression assay. Preparation of antigens in 
caecal contents was performed as previously reported’’. Caecal contents from 
germ-free mice or +17-mix mice were collected and suspended in PBS (500 mg 
ml !); they were then filtered through a 70-11m mesh, and autoclaved at 121 °C for 
15 min. To prepare antigens of bacterial components, the 17 strains of Clostridia 
were cultured in vitro, mixed, washed and suspended with 1 ml PBS, and auto- 
claved at 121°C for 20min. CD11c* cells were isolated by FACSAriallI from 
spleens of SPF C57BL/6 mice and pulsed for 1h with 0.5 1M SIINFEKL OT-I 
peptide alone or in combination with either of 5 uM ISQAVHAAHAEINEAGR 
OT-II peptide, autoclaved caecal contents from +17-mix mice or germ-free mice 
(diluted 1:200), or autoclaved 17 strains of bacteria cultured in vitro (diluted 
1:200). The antigen-pulsed CD11c" cells were plated at 5 X 10* per well in 96- 
well round-bottomed plates. CD8 T cells (Ter cells) were sorted from spleens of 
SPF OT-I mice by FACSArialII and added to the CD11c* cell-seeded plates at 
5x 10* per well. Then, CD4*CD25* T cells (Treg cells) sorted from colonic lamina 
propria of +17-mix mice or from spleens of SPF OT-II mice were added to the 
culture at the indicated ratio of Treg to Tey cells. After 3 days, all cells were har- 
vested, stained with anti-CD4 and anti-CD8 antibodies, and analysed by 
FACSAriallI to enumerate the number of CD8 OT-I T cells. 

Intestinal epithelial cell stimulation with caecal extracts and SCFAs. To pre- 
pare caecal extracts, frozen caecal contents from germ-free, +17-mix or SPF mice 
were thawed and well suspended in 4 volumes of sterile water. After centrifugation 
(5,000 r.p.m. for 15 min), transparent supernatants were collected, filtered through 
0.22 tm filter and used as caecal extracts. In some experiments, caecal extracts 
were treated with proteinase K (2mg ml 1 55°C for 1h; Roche) or nuclease that 
degrades all forms of DNA and RNA (125 unit ml 1 37°C for 4h; Thermo), and 
subsequently heated at 95 °C for 5 min to inactivate the enzymes. Human intest- 
inal epithelial cell lines (HCT8, HT29, Caco2, T84 and Colo205) and a mouse 


epithelial cell line (CMT93) were obtained from ATCC and maintained at 37 °C 
(5% CO2) in RPMI containing 10% heat-inactivated horse serum (Invitrogen). 
Cells were cultured at 1.5 X 10° cells in 150 ul medium in 48-well plates and 
stimulated with 4.5 jul caecal extract for 24h. Human primary intestinal epithelial 
cells were obtained from Lonza and maintained at 33 °C (5% CO2) in SmGM-2 
medium containing 10% FBS (Lonza) for 1-2 weeks (6 x 10* cells in 48-well 
plates). The medium was changed to 150 pl SmGM-2 containing 1% FBS before 
stimulation. Caecal extracts (4.5 pl) were added to the culture and incubated for 
24h. Culture supernatants were collected and the level of the active form of TGF- 
1 (Promega), TNF-o (R&D) and IL-6 (R&D) was measured by ELISA. To stimu- 
late epithelial cell lines with SCFAs, sodium salts of acetate, butyrate, propionate 
and isobutyrate were dissolved in PBS. SCFAs were added to the culture individu- 
ally (final 0.5 mM) or in combination (final 0.5 mM each), and incubated for 24 h. 
TNBS colitis. C57BL/6 SPF adult mice were orally inoculated with 17-mix or 
control PBS every 2 or 3 days for 3 weeks. 2,4,6-Trinitrobenzene sulphonic acid 
(TNBS)-induced colitis was induced by the intracolonic administration of 2.5 mg 
of TNBS (Sigma) in 50% ethanol into anaesthetized mice via a thin round-tip 
needle. The tip of the needle was inserted 4 cm proximal to the anal verge, and mice 
were held in a vertical position for 30s after the injection. All the mice were 
observed daily and were killed on day 4 after TNBS administration. Colons were 
fixed with 4% paraformaldehyde, sectioned, and stained with haematoxylin and 
eosin. The degree of inflammation in the distal part of colon was graded from 0 to 4 
as follows: 0, normal; 1, ulcer with cell infiltration limited to the mucosa; 2, ulcer 
with limited cell infiltration in the submucosa; 3, focal ulcer involving all layers of 
the colon; 4, multiple lesions involving all layers of the colon, or necrotizing ulcer 
larger than 1 mm in length. 

Allergic diarrhoea. BALB/c SPF adult mice were primed by subcutaneous injec- 
tion with 1 mg of OVA (Fraction V; Sigma) in 100 ,1l of Complete Freund Adjuvant 
(CFA, DIFCO). One week after priming, mice were given 50 mg of OVA dissolved 
in 200 ul of PBS by intra-gastric administration three times per week. 17-mix or 
control PBS was orally administered to mice every 2 or 3 days for the entire period 
of the experiments. Diarrhoea was monitored visually 1h after each oral OVA 
challenge. Diarrhoea was scored as follows: 0, normal faeces (solid); 1, moist faeces 
(semi-solid); 2, mild diarrhoea (loose); and 3, severe diarrhoea (watery). Serum was 
collected from the cheek vein 1 h after the last OVA challenge and OVA-specific 
IgE levels were measured by ELISA (Chondrex). 

Adoptive CD4* CD45RB™ T-cell transfer model of colitis. Germ-free CB.17 
SCID mice were orally inoculated with SPF faeces together with or without 17-mix 
of Clostridia. One week later, exGF SCID mice received 4 X 10° CD4* CD45RB" T 
cells by intraperitoneal injection. Naive CD4* CD45RB"'T cells were isolated from 
spleens of SPF BALB/c mice by FACS sorting. All the mice were observed daily and 
were killed on day 14 after T-cell transfer. 
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Pyrimidine homeostasis is accomplished by directed 


overflow metabolism 


Marshall Louis Reaves’, Brian D. Young', Aaron M. Hosios!?, Yi-Fan Xu’? & Joshua D. Rabinowitz! 


Cellular metabolism converts available nutrients into usable energy 
and biomass precursors. The process is regulated to facilitate effi- 
cient nutrient use and metabolic homeostasis. Feedback inhibition 
of the first committed step of a pathway by its final product is a 
classical means of controlling biosynthesis’ *. In a canonical example, 
the first committed enzyme in the pyrimidine pathway in Escherichia 
coliis allosterically inhibited by cytidine triphosphate’*’. The physio- 
logical consequences of disrupting this regulation, however, have 
not been previously explored. Here we identify an alternative regu- 
latory strategy that enables precise control of pyrimidine pathway 
end-product levels, even in the presence of dysregulated biosyn- 
thetic flux. The mechanism involves cooperative feedback regulation 
of the near-terminal pathway enzyme uridine monophosphate 
kinase’. Such feedback leads to build-up of the pathway intermediate 
uridine monophosphate, which is in turn degraded by a conserved 
phosphatase, here termed UmpH, with previously unknown physio- 
logical function”*. Such directed overflow metabolism allows homeo- 
stasis of uridine triphosphate and cytidine triphosphate levels at the 
expense of uracil excretion and slower growth during energy lim- 
itation. Disruption of the directed overflow regulatory mechanism 
impairs growth in pyrimidine-rich environments. Thus, pyrimidine 
homeostasis involves dual regulatory strategies, with classical feed- 
back inhibition enhancing metabolic efficiency and directed over- 
flow metabolism ensuring end-product homeostasis. 

The metabolic network of E. coli consists of approximately 1,000 
metabolites connected by around 2,000 enzyme-catalysed reactions’. 
Control of metabolite concentrations and fluxes occurs through the 
regulation of enzyme concentrations, activities and substrate occupan- 
cies. Metabolic control analysis provides a systematic framework for 
investigating the impact of particular enzymes on cellular metabolic 
activities’? '’. Studies modulating the concentrations of enzymes sug- 
gest that control of metabolic flux is frequently distributed across 
multiple enzymes, with demand for end product often having a key 
role in controlling biosynthetic fluxes*’*"°. 

Consistent with distributed flux control, de novo pyrimidine bio- 
synthesis has been reported to be regulated both at the first committed 
pathway step, catalysed by aspartate transcarbamoylase (ATCase), and 
the previous step, catalysed by carbamoyl phosphate synthetase 
(CPSase), which also feeds arginine biosynthesis. The E. coli ATCase 
enzyme complex, which consists of six catalytic and six regulatory 
subunits, is subject to feedback inhibition by the pyrimidine end pro- 
ducts uridine triphosphate (UTP) and more strongly cytidine tripho- 
sphate (CTP), and is activated by ATP'?"”-”’. Its allosteric regulation 
provided one of the first examples of feedback inhibition’*. CPSase is 
feedback inhibited by the pyrimidine intermediate uridine monopho- 
sphate (UMP)”°. 

To explore the physiological relevance of ATCase and CPSase allostery, 
we created strains dysregulated for feedback control of ATCase (Apyrl), 
CPSase (carB* (carB(S948F))"') or both (ApyrI carB*) (Fig. 1a). We 
then analysed the metabolite concentrations in these strains by liquid 
chromatography-mass spectrometry (LC-MS)-based metabolomics. We 


anticipated that such strains would have increased levels of pyrimidine 
nucleotide triphosphates (NTPs)*"'’’, the pathway’s terminal pro- 
ducts; however, UTP and CTP levels were steady in the absence of feed- 
back control. Instead, the only notable change we observed was markedly 
increased uracil levels (Fig. 1b-d). To understand the robustness of 
this pathway, we also conducted transcriptome analysis. Rather than 
compensatory downregulation of pyrimidine biosynthetic genes in the 
ApyrI carB* strain, we observed modest upregulation. In addition, we 
observed enhanced expression of genes involved in arginine synthesis 
(which also requires CPSase) and of the Rut pathway (a recently disco- 
vered uracil-degradation pathway that is induced by uracil) (Supplemen- 
tary Fig. 1). 

To assess the homeostatic capacity of pyrimidine metabolism in 
response to pyrimidine intermediate addition, we switched E. coli 
grown on minimal media to media containing orotate or uracil. 
Such pyrimidine upshift was sufficient to activate feedback inhibition 
of pyrimidine synthesis in the wild type, as evidenced by reduced 
N-carbamoyl-aspartate and dihydroorotate concentrations within 
5 min; this decrease did not occur in the ApyrI carB* strain (Fig. le 
and Supplementary Fig. 2). In addition, pyrimidine upshift led to 
markedly increased uracil, and in the case of orotate addition, also 
UMP, and these increases were observed in all genetic backgrounds 
(Fig. le). Moreover, in both the presence and absence of feedback 
control, there were only minor increases in UTP and CTP. 

Given the end-product homeostasis in the doubly feedback-dysregulated 
strain, even upon pyrimidine upshift, we sought to confirm that the 
Apyrl carB* strain is indeed defective in de novo pyrimidine biosyn- 
thetic regulation. To this end, we measured the incorporation of iso- 
topically labelled uracil or orotate into UTP and CTP. Similar to the 
wild-type strain, the ApyrI carB* strain imported the labelled inter- 
mediates and incorporated them into end products. The residual pro- 
duction of unlabelled end products, indicative of persistent de novo 
synthesis, however, was higher in the ApyrI carB* strain (Fig. 1f and 
Supplementary Fig. 2). This confirms that feedback regulation is func- 
tionally impaired. 

If the feedback-regulation mechanisms facilitated superior pyrimi- 
dine homeostasis, perhaps too subtle to be detected by our LC-MS 
methods, we expected that the feedback-dysregulated strain would 
have a growth defect relative to its wild-type parent. The ApyrI carB* 
strain did not, however, exhibit impaired growth in various nutrient 
conditions, including rich medium, minimal media, media enriched 
in nucleotides or amino acids, media limited for nitrogen or phosph- 
ate, or periodic switches between these conditions (Fig. 1g). We did, 
however, detect a modest (~10%) growth advantage for the wild-type 
strain under anoxic conditions and in the presence of the uncoupler 
2,4-dinitrophenol. 

Slower growth only during energy limitation suggests that the feedback- 
dysregulated strain engages in a chronic, energy-wasting process. Given 
the observed uracil excretion, a likely candidate for this inefficient pro- 
cess is the degradation of UMP to uracil. Indeed, orotate addition results 
in both higher UMP and higher uracil (Fig. le), and when the orotate is 
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Figure 1 | Increased pyrimidine flux triggers overflow to uracil. 

a, Canonical pyrimidine regulatory schematic. Carbamoyl phosphate is an 
intermediate in both pyrimidine and arginine synthesis. Carbamoyl aspartate is 
committed to pyrimidine synthesis. Carbamoyl phosphate synthetase (carAB) 
is feedback inhibited by UMP. Aspartate transcarbamoylase (pyrBI) is feedback 
inhibited by UTP and CTP and activated by ATP. b, ¢, Extracted ion 
chromatograms showing uracil (b) and CTP (c) levels in wild-type (black) and 
feedback-defective (ApyrlI carB*; red) strains. d, Metabolite fold changes 
relative to wild type in ApyrI, carB* and ApyrI carB* strains. Error bars denote 
+ standard error (n = 6). e, Metabolite fold changes at 5 min after addition of 
orotate. Fold changes were computed relative to un-supplemented controls. 
Error bars denote + standard deviation (n = 2-3). Time course data appear in 
Supplementary Fig. 2. f, Fraction of UMP and UTP derived from endogenous 


15N]_labelled, the extent of uracil isotope labelling mimics that of UMP 
(Supplementary Fig. 3). To elucidate the route from UMP to uracil, we 
knocked out the genes known to catalyse the interconversion of UMP, 
uridine and uracil’: udk (uridine + GTP —- UMP + GDP), upp 
(uridine + PRPP — UMP + pyrophosphate (PP;)) and udp (uridine 
+ inorganic phosphate (P;) — ribose-1-P + uracil). Only the udp 
knockout reduced uracil accumulation, decreasing it to 17% of the 
level normally found in the feedback-dysregulated background 
(Fig. 2b). We were able to eliminate uracil production more fully (to 
6%) by double deletion of udp and the cytosine deaminase codA. 
Following orotate addition, this double-deletion mutant feedback- 
dysregulated strain accumulates uridine instead of uracil (Fig. 2c and 
Supplementary Fig. 4). 

There is no enzyme with a well-described physiological function of 
UMP degradation to uridine. This reflects the lack of a straightforward 
genetic screen for such enzymes. Although often annotated as revers- 
ible, the Udk and Upp reactions are thermodynamically unfavourable 
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sources and from exogenously added '°N-orotate. Error bars denote 

+ standard deviation (n = 2). The ApyrI (blue) and wild-type (black) lines lie 
under the carB* (green) line. g, Competitive growth advantage of wild-type 
versus Apyrl carB* appears selectively under energy limitation. Competitions 
were performed in indicated media with lacZ” marker in wild-type and in 
feedback-dysregulated strain (wild-type lacZ) to control for effect of marker 
on growth. Calculations and experimental details are described in Methods. In 
brief, media were glucose-ammonia minimal media unless otherwise indicated. 
Alternative carbon and nitrogen sources were used in conditions K, L and O. 
+ indicates supplements to the minimal media; +/— indicates alternating 
supplementation/removal of indicated nutrient every 8 h. Grey ellipses mark 
95% confidence interval (n = 6-10). 


for the degradation of UMP (both AGyg, and AGupp = +22 kJ mol 
based on AG? yg, = +19.5kJ mol” * and AG’ Upp = +20kJ mol ‘and 
[GTP] = 4.9mM, [GDP] = 0.68 mM, [uracil] = 5.5 mM, [uridine] = 
0.15 mM, [UMP] = 0.5 mM, [diphosphate] = ~1 mM and [PRPP] = 
0.26 mM”), and indeed, their genetic deletion does not prevent ura- 
cil accumulation (Fig. 2b). On the basis of published in vitro enzymo- 
logy and homology”, we identified ~15 putative UMP phosphatase, 
hydrolase and nucleotidase genes with unknown physiological func- 
tions and screened deletion mutants for uracil levels following orotate 
addition. We found two genes whose deletion led to lower uracil and 
higher UMP levels than the wild type (Fig. 2c), indicating a possible 
role in UMP degradation in conditions of pyrimidine excess. We pro- 
pose that these genes, nagD and surE, be renamed umpH and umpG, 
respectively, to highlight their newly recognized activities in UMP 
degradation. UmpH and UmpG belong to the haloacid dehalogenase- 
like phosphatase family and SurE phosphatase families, respectively, 
whose members span taxa from bacteria to humans, but little is known 
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Figure 2 | Pyrimidine overflow pathway is initiated by catabolism of UMP 
by UmpH. a, Pathway schematic. b, Uracil excretion does not depend on the 
canonical pyrimidine interconversion enzymes Udk and Upp, but does require 
Udp. Excreted uracil accounts for approximately half of total uracil. Error bars 
mark standard deviation (n = 2-3). ¢, Uracil is produced by UMP degradation 
following orotate addition. UMP is degraded to uridine by UmpH (also known 
as NagD) and UmpG (also known as SurE), and uridine to uracil by Udp. 
Isotopic tracing from orotate to UMP, uridine and uracil appears in 


about their physiological importance”*”*”*. We observed that knockout 
of umpH, although not altering pathway end-product levels, impaired 
growth of E. coli upon orotate upshift, with the double deletion showing 
a stronger phenotype and end-product accumulation (Fig. 2c, d and 
Supplementary Fig. 4). Thus, UmpH, and to a lesser extent UmpG, 
function in a UMP-degradation pathway that is required for optimal 
growth in response to environmental pyrimidine intermediates. 

The UmpH and UmpG phosphatases degrade UMP to uridine, 
which is further degraded by Udp to uracil and ribose-1'-phosphate 
(cytosine similarly liberated from CMP may be deaminated by CodA 
to uracil). This overflow pathway dissipates three high-energy phos- 
phate bonds (ATP equivalents) and one NADPH per uracil excreted 
aerobically, increasing to five ATP and inducing production of ubiqui- 
nol under anoxia, which may be problematic (Supplementary Fig. 5). 
The most straightforward way to control flux through this overflow 
pathway is through the concentration of its substrate UMP. In all con- 
ditions where we observe uracil excretion, we also observe increased 
intracellular UMP. The normal steady-state UMP concentration 
(0.052 mM) is below the Michaelis constant (K,,) of UmpH for UMP 
(0.12 mM), but rises to eight times the K,, under conditions inducing 
directed overflow metabolism (Fig. 2c). This suggests that UMP accu- 
mulation has a central role in triggering the overflow flux. 

We sought to identify the molecular underpinnings of UMP accumu- 
lation during growth in medium enriched in pyrimidine intermediates. 


Supplementary Fig. 3. Residual uracil in a dudp AcodA strain indicates at least 
one unannotated activity also makes minor contributions to uracil production. 
End-product levels are not perturbed following orotate addition by deletion of 
either dumpH or AumpG individually, but double deletion leads to increased 
UTP. Error bars mark + standard deviation (n = 3). d, Inability to degrade 
UMP to uracil causes a growth defect upon orotate addition but increases final 
culture density (reproducible in two biologically independent experiments each 
of 6-8 technical replicates). D, attenuance. 


In analysing the regulatory architecture of the pathway, we noticed an 
additional potential feedback loop near the end of the pathway: UMP 
kinase (which catalyses the ATP-dependent phosphorylation of UMP 
to UDP), encoded by pyrH (Fig. 3a), is regulated in vitro by cooperative 
(ultrasensitive) inhibition by UTP (Hill-coefficient of 2.8, inhibition 
constant (K;) of 154 |1M)°. This inhibition can be partially overcome by 
GTP activation”, and thus, like the regulation of ATCase by CTP and 
ATP, may function to achieve balance between pyrimidine and purine 
pools. Moreover, regulation at the end of the pathway would have the 
advantage of minimizing perturbations in UTP and CTP levels in 
response to alterations in any upstream pathway substrate (for 
example, not only of the de novo pathway, but also uracil, orotate, or 
their nucleosides). 

To test the importance of the cooperative inhibition of UMP kinase 
by UTP, we created strains expressing mutant forms of UMP kinase 
with reduced sensitivity to UTP: the N72A mutant has a higher Kj 
(373 uM for UTP rather than 154M), and the D93A has both a 
higher K; (332 11M) and less cooperative inhibition (Hill coefficient 
1.6 rather than 2.8). Because UMP kinase catalyses an essential reac- 
tion, we first transformed wild-type cells with low-copy plasmids 
(pACYC) carrying a natively expressed pyrH allele (pyrH", N72A 
or D93A) before knocking out genomic pyrH by transduction. We 
then assayed the growth and metabolic response of these strains to 
uracil and orotate. Although all three strains grew normally in the 
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Figure 3 | Cooperative inhibition of UMP kinase by UTP maintains end- 
product homeostasis. Variant pyrH alleles were expressed from its native 
promoter on low-copy plasmid (pACYC) with genomic pyrH removed. 

a, Schematic of downstream regulatory events in pyrimidine metabolism. 
Wild-type UMP kinase (PyrH) is feedback inhibited by UTP in a switch-like 
manner (high degree of cooperativity). Allosteric parameters of pyrH alleles 
appear in shaded box, with D93A lacking the switch-like behaviour. ny, Hill 
coefficient. b, Altered expression or regulation of UMP kinase following orotate 
addition impairs growth. Defects also occur in pyrH'/pyrH diploid strains 
(Supplementary Fig. 6). c, Metabolite fold changes upon orotate addition in 
strains with altered UMP kinase expression or allosteric regulation reveal 
defects in both pyrimidine and purine homeostasis. Error bars mark standard 
deviation (n = 3). 


absence of pyrimidines and upon uracil addition (Supplementary Fig. 6), 
orotate addition inhibited growth of all three strains, particularly those 
with impaired inhibition of UMP kinase by UTP (Fig. 3b). The growth 
defect of the pyrH”" (pACYC::pyrH™" ApyrH) strain was presumably 
due to mild overexpression of wild-type UMP kinase, as inducible 
overexpression led to the same response (pAC24N::pyrH™! + 0.05 uM 
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isopropyl B-p-1-thiogalactopyranoside (ITPG); Supplementary Fig. 6). 
Thus, proper control of UMP kinase protein levels is required for an 
optimal response to pyrimidine upshift. The cooperative feedback inhibi- 
tion of UMP kinase by UTP and CTP is also important, as impairment of 
such regulation led to a profound growth defect upon orotate addition 
(pACYC::pyrH(D93A) ApyrH, Fig. 3b). 

We also detected changes to the nucleotide pools of these pyrH 
mutants in response to orotate addition. In the wild type, orotate 
addition leads to UMP accumulation without major changes in NTP 
levels (Fig. 3c). Lack of proper UMP kinase regulation, however, results 
in increased UTP and decreased ATP. Thus, dysregulated pyrimidine 
metabolism saps purines, either by increasing their usage (for example, 
because higher pyrimidine NTPs leads to increased ribosomal RNA 
biosynthetic rates), decreasing their synthesis (for example, by deplet- 
ing required substrates), or the combination of these factors. 

We designate the general mechanism, whereby feedback inhibition 
of a downstream pathway step leads to excretion of a pathway inter- 
mediate or by-product, directed overflow metabolism (Fig. 4a). Such 
overflow is triggered by excessive biosynthetic pathway flux and is 
carried out by a degradation pathway sensitive to levels of an accu- 
mulating biosynthetic intermediate. It is analogous to overflow in 
central carbon metabolism, wherein excessive sugar catabolism (typi- 
cally via glycolysis) leads to build-up of pyruvate’*, which may be 
excreted as lactate, ethanol or acetate, depending on the organism. 
In the case of lactate excretion in humans, inhibition of pyruvate 
dehydrogenase (by high NADH or inhibitory phosphorylation”) has 
a similar role to the inhibition of UMP kinase in the pyrimidine path- 
way: it forms a choke point that instead directs the enzyme’s normal 
substrate towards by-product formation and excretion (Fig. 4b). In the 
case of pyrimidine biosynthesis, the cooperative inhibition of UMP 
kinase by UTP renders the directed overflow mechanism exquisitely 
precise in controlling UTP and CTP levels. 

Thus, pyrimidine homeostasis involves two strategies for regulation. 
The canonical feedback architecture contributes to metabolic efficiency 
by decreasing unnecessary de novo flux. Directed overflow metabolism 
provides end-product homeostasis by diverting excess flux to uracil, 
thereby ensuring end-product homeostasis in response to altered avail- 
ability of the full range of pathway substrates and intermediates. These 
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Figure 4 | Directed overflow metabolism in biosynthesis is analogous to 
central carbon overflow metabolism. a, Schematic of directed overflow 
metabolism as a biosynthetic regulatory mechanism. b, Schematic of overflow 
in central carbon metabolism, using the Warburg effect as a canonical example. 
PDH, pyruvate dehydrogenase; PDHK, pyruvate dehydrogenase kinase (which 
catalyses inhibitory phosphorylation of PDH); TCA, tricarboxylic acid. 
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upstream and downstream regulatory mechanisms work in concert to 
balance speed, efficiency and robustness. 


METHODS SUMMARY 


E. coli (parent strain NCM3722 (ref. 30)) were prepared for metabolite measure- 
ment using a filter culture technique: cell-laden nitrocellulose filters were grown 
on agarose plates. Pyrimidine upshift was accomplished by transferring filters to 
plates enriched with indicated supplements. Metabolism was quenched and meta- 
bolites concomitantly extracted by placing filters into —20 °C solvent (40% meth- 
anol, 40% acetonitrile and 20% water with 0.1 M formic acid). This solvent mixture 
reliably extracts nucleotides, nucleosides and bases without their interconversion 
or degradation. Negative mode LC-MS and LC-MS/MS measurements were 
performed as previously described”’. Deletion strains were created by P1 transduc- 
tion from the Keio deletion collection. Stable strains carrying carB* were generated 
by electroporation and lambda Red-mediated recombination of PCR-amplified 
S948F into a AcarB::kan strain (where kan denotes kanamycin resistance) fol- 
lowed plating and selection for prototrophs on minimal media. Growth was 
assayed by absorbance at 600 nm in a 96-well format using a Biotek Synergy II 
plate reader. Competitive growth advantage was assessed by co-culture of a 
marked lacZ (AlacZ::kan) and an unmarked lacZ* strain, determination of rela- 
tive cell numbers after competitive growth by plating on MacConkey agar contain- 
ing 1% lactose, and regression analysis of wild type to Apyrl carB* ratios. Unless 
otherwise indicated, growth media for metabolic studies consisted of Gutnick 
minimal media with 0.4% glycerol and 10 mM NH,Cl with triple-washed ultra- 
pure agarose and 0.8 mM orotate as needed. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 

Strains. The wild-type E. coli K-12 strain NCM3722 was used in this study as wild 
type and as parent for all strains in this study because it lacks the rph-1 mutation 
carried by MG1655 that causes pyrimidine pseudo-auxotrophy from reduced 
expression of orotate phosphoribosyltransferase (pyrE)*°. Single-gene deletion 
mutants of NCM3722 were generated by P1 transduction” of deletion alleles with 
kanamycin resistance (kan) cassettes from the Keio collection” and verified using 
PCR with gene-specific primers. Multiple gene deletions were similarly created 
serially using the FLP helper plasmid system (pCP20), which was also used to 
eliminate kanamycin resistance cassettes to produce scarred deletions”. 

The carB(S948F) mutant was generated using the lambda-Red recombinase 
system*’. A plasmid-borne S948F mutant allele of carB*' was amplified from 
plasmid by PCR (carB forward primer, 5’-CGCATAAATCCCTGTTCGAC-3’; 
carB reverse primer, 5'-CCATTCGGCGATTAACAAGT-3’), gel-purified and 
used to transform a NCM3722 AcarB::kan strain carrying pXD46 and induced 
with arabinose as described previously**. Recombinants were selected on minimal 
media for pyrimidine and arginine prototrophy and subsequently cured of pKD46. 
Following screening for loss of all antibiotic resistance (ampicillin and kanamy- 
cin), the carB(S948F) (carB*) mutation was confirmed by sequencing. 

Strains natively expressing variant pyrH alleles in a ApyrH::kan background 
were also created using the lambda-Red recombinase system. The pACYC::pyrH is 
the ligation product of Xbal-digested, CAP-treated pACYC184 (all New England 
Biolabs) with XbaI-digested wild-type pyrH generated by PCR from NCM3722 
genomic template using primer set (pyrH forward primer, 5'-ACTCTAGACCAA 
TGCAAAACCCGTCTAT-3’; pyrH reverse primer 5'-GATCTAGAACTTACG 
CGGAATCTTACCC-3’) (Xbal sites are underlined). Variant alleles were pro- 
duced through site-directed mutagenesis (Genewiz) and verified by sequencing. 
NCM3722 carrying plasmid pACYC::pyrH and pKD46 was transformed with a 
kan cassette flanked by pyrH homology regions (created by PCR of a Keio collec- 
tion strain with PAGE-purified primer set (forward, 5'-TTGTAAATTCAGCTA 
ACCCTTGTGGGGCTGCGCTGAATTCCGGGGATCCGTCGACC-3'’; reverse, 
5'-ACCAAACTGCCTGCAACAATAACGCCTTATAACCAGTGTAGGCTGG 
AGCTGCTTCG-3’)) to create a genomic ApyrH::kan allele. To prevent suppres- 
sing mutations, the ApyrH::kan allele was introduced by P1 transduction to strains 
carrying pyrH alleles on the pACYC plasmid and thereafter strains retained plas- 
mid born resistances in absence of antibiotic. 

Strains expressing pyrH from IPTG-inducible promoter on the pAC24N were 

created by transformation with pAC24N::pyrH from the Aska collection after 
removal of green fluorescent protein by NotI digestion and self-ligation™’. 
Variant alleles were produced through site-directed mutagenesis (Genewiz) and 
verified by sequencing. Low-level expression was induced with IPTG as indicated. 
Media and bacterial culture. Gutnick glucose minimal medium (‘Glucose’, 
Gmin1) refers to a salts mixture” with a 0.4% (w/v) glucose carbon source and 
a 10mM ammonium chloride nitrogen source unless otherwise stated. ‘Glycerol’ 
and ‘acetate’ refer to the identical Gutnick with 0.4% glycerol or 0.4% sodium 
acetate, respectively, substituted for glucose. Strains were also grown in media 
supplemented (+) with one of the following: adenine, arginine, guanosine, hypox- 
anthine, ornithine, uracil or 2,4-dinitrophenol (all 1 mM). Media with alternative 
nitrogen sources contained Gutnick salts, 0.4% glycerol and 10 mM total concen- 
tration of usable nitrogen. Phosphate-limited MOPS (TekNova) medium was 
prepared with 0.4% glycerol, 10 mM ammonium chloride and 0.5 mM monobasic 
potassium phosphate**. Anoxia experiments were conducted in Gmin1 media at 
37 °C inside a chamber containing approximately 90% nitrogen gas, 5% carbon 
dioxide and 5% hydrogen. Oxygen levels were maintained at 0 p.p.m. and rarely 
exceeded 100 p.p.m. +/— indicates alternating every 8 h between glycerol minimal 
media and enriched minimal media as indicated above. ‘LB, glucose’ indicates cells 
were alternated between LB and Gmin1 media every 8h. Difco MacConkey agar 
(Becton, Dickinson) was prepared according to the manufacturer’s instructions 
with 1% lactose as carbon source. 
Metabolite measurements. Cells were cultured and extracted at —20 °C 40:40:20 
methanol:acetonitrile:water with 0.1M formic acid* using the filter culture 
method described previously**. Low-molecular-weight metabolites were quanti- 
tated using both LC-MS” and LC-MS/MS" at both steady state” and following 
addition of exogenous pyrimidines, including unlabelled and isotopically labelled 
uracil and orotate. Absolute quantification of uracil and UMP was computed as in 
ref. 23. LC-MS(/MS) data were analysed using MAVEN software”. U-[°N]-tra- 
cers (for example, uracil and orotate) were purchased from Cambridge Isotopes 
Laboratories. 


Competition assays. The lacZ (AlacZ::kan) derivatives of wild-type and 
ApyrI carB* strains were prepared as described above. After overnight culture in 
a 50:50 mix of medium for competition and Gmin1, lacZ* and lacZ” strains were 
combined to equal proportions (based on attenuance Dgoo) and were diluted to an 
OD6o0 of 0.02 in medium for competition to create two paired cultures: wild type 
lacZ with Apyrl carB* lacZ* and wild type lacZ* with ApyrI carB* lacZ. Each 
culture pair was grown at 37°C and generally diluted every 8h (12h for media 
with doubling time over 120 min) into fresh medium to Doo of 0.02. Some cultures 
were diluted into alternating media as indicated above. 

The ratio of wild type to Apyrl carB* of the culture was determined by regular 
plating on MacConkey agar plates with 1% lactose (6-10 replicates). Cultures were 
diluted to ensure that approximately 100 colonies formed on each plate. 

Growth advantage was computed by a linear regression of the ratio of wild-type 
and mutant strains. The ratio of wild-type to mutant cells in time, f, can be 
described as R(t) = Ry X 2°%°/W—"), where Ry is the initial ratio, and W and 
M are the doubling times for the wild type and the mutant, respectively. 
Performing a linear regression of this equation (R software, http://cran.r-project. 
org) in logarithmic space allows for computation of the growth advantage. 

For 2,4-dinitrophenol, advantage was similarly computed from the growth rate 

of each strain (JacZ*) individually in culture and plotted along the diagonal. 
Microarrays. RNA purification, labelling, microarray measurement and data 
processing were performed as described in ref. 42. In brief, overnights of the 
wild-type and ApyrI carB* mutant were prepared as above in Gmin2 and grown 
to mid-exponential phase (Dgoo of 0.4) and then mixed with RNAprotect Bacteria 
Reagent according to manufacturer’s specifications (Qiagen). After 5 min at 25 °C, 
the mixture was pelleted for 10 min at 5,000g, and RNA was extracted using the 
Total RNA Purification kit (Norgen Biotek). Purified total RNA was treated for 
30 min using E. coli poly(A) polymerase (New England Biolabs). RNA from the 
wild-type and ApyrI carB* strains were labelled with Cy5 and Cy3, respectively, 
using the Two-Colour Low Input Quick Amp Labelling Kit (Agilent). Samples 
were hybridized to an E. coli Gene Expression Microarray (Agilent). Resulting data 
were analysed using iPAGE” and R software. 
Growth assays. Absorbance measurements were obtained using a Biotek Synergy 
II plate reader (Biotek, Winooski, VT) in 96-well format at 600 nm using clear, flat- 
bottom plates with two independent biological replicates each consisting of six to 
eight technical replicates. Cultures included 0.1% Tween-20 to prevent clumping 
and were sealed with air permeable membrane. To facilitate comparisons, some 
curves are aligned to Dgoo of 0.03. 
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Lars Guelen, Ludo Pagie, Emilie Brasset, Wouter Meuleman, 
Marius B. Faza, Wendy Talhout, Bert H. Eussen, Annelies de 
Klein, Lodewyk Wessels, Wouter de Laat & Bas van Steensel 


Nature 453, 948-951 (2008); doi: 10.1038/nature06947 


CORRIGENDUM 
doi:10.1038/naturel 2324 


Corrigendum: Immune 
surveillance by CD8aa~* 
skin-resident T cells in human 


herpes virus infection 


Jia Zhu, Tao Peng, Christine Johnston, Khamsone Phasouk, 
Angela S. Kask, Alexis Klock, Lei Jin, Kurt Diem, 
David M. Koelle, Anna Wald, Harlan Robins & Lawrence Corey 


Nature 497, 494-497 (2013); doi:10.1038/nature12110 


In this Letter, Fig. 4g and Supplementary Fig. 5f depict an analysis of 
H3K9me3 (not H3K9me2) ChIP data. In addition, three sentences in 
the third paragraph of the “Microarray data analysis’ section of the 
Methods should read: “Gene expression data and H3K27me3 ChIP 
data are from Tig3 lung fibroblasts'*. H3K9me3 ChIP data are from 
human fetal lung fibroblasts*°... H3K4me2 and PollI ChIP data are 
from WI38 human lung fibroblasts”’.” 


ERRATUM 
doi:10.1038/naturel2325 


Erratum: Integrated genomic 
characterization of endometrial 
carcinoma 


The Cancer Genome Atlas Research Network 


Nature 497, 67-73 (2013); doi:10.1038/nature12113 


In the ‘Results’ section of this Article, the range in the sentence “The 
median follow-up of the cohort was 32 months (range, 1-19 months); 
21% of the patients have recurred, and 11% have died.” should have 
been 1-195 months. This error has been corrected in the HTML and 
PDF versions of the paper. 
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In Fig. 2a of our Letter, the label “CXCR7’ should be “CXCR6’, and the 
label ‘CXCR8’ should be ‘CXCR7’. This error has been corrected in the 
HTML and PDF versions of the paper. 


©2013 Macmillan Publishers Limited. All rights reserved 


GARY WATERS/IKON IMAGES/CORBIS 


CAREERS 


TURNING POINT An interest in climate 
change bridges science and policy p.245 


@NATUREJOBS Follow us on Twitter for the 
latest news and features go.nature.com/e492¢f 


NATUREJOBS For the latest career 
listings and advice www.naturejobs.com 


DATA-SHARING 


Everything on 


display 


Researchers can get visibility and connections by putting 
their data online — if they go about it in the right way. 


BY RICHARD VAN NOORDEN 


make her research data freely available 
online. “The idea that data should be 
public has been in the background through 
my entire career,’ she says. 
Yet in 2003-09, while she was working on 
her ecology PhD, there were few incentives for 
her to share. Sharing would not help her to get 


L= Wolkovich always felt she ought to 


grants or publications, and although posting 
data online was not unheard of, few research- 
ers actually did it, she says. Many preferred to 
hang on to their hard-won field data, sharing 
privately if they did so at all. 

Butafter she earned her doctorate, Wolkovich 
overcame her hesitation, thanks to a combina- 
tion of helpful colleagues, improved resources 
and a discernible shift in the research commu- 
nity’s attitude. So in 2010, through an online 


data repository called the Knowledge Network 
for Biocomplexity, Wolkovich released her doc- 
toral data set — the fruit of thousands of hours 
spent measuring the diversity of arthropods in 
56 experimental soil plots she had set up in the 
arid scrubscape of southern California. Since 
then, she has publicized all the data that she has 
collected, including a meta-analysis of 50 other 
studies that she examined to see how factors 
such as rising temperatures affect the life cycles 
of plants. Wolkovich, now at the University of 
British Columbia in Vancouver, Canada, says 
that she herself had never objected to sharing 
her results — she had just not known how to do 
so. She likes the fact that her data are now easily 
accessible to other researchers and anyone else 
who is interested. “It saves me so much time,” 
she says. 

Wolkovich is one of a number of early- 
career researchers who are enthusiastically 
posting their work online. They are publish- 
ing what one online-repository founder calls 
small data — experimental results, data sets, 
papers, posters and other material from indi- 
vidual research groups — as opposed to the ‘big 
data spawned by large consortia, which usually 
employ specialists to plan their data storage and 
release. The many resources now available give 
researchers options for where and how to post 
their data, releasing potentially fruitful data sets 
that used to be locked up in unpublished paper 
files, buried in journal-article appendices or 
hidden away on scientists’ hard drives. 


OPENING UP 
Open data-sharers are still in the minority 
in many fields. Although many researchers 
broadly agree that public access to raw data 
would accelerate science — because other 
scientists might be able to make advances not 
foreseen by the data’s producers — most are 
reluctant to post the results of their own labours 
online (see Nature 461, 160-163; 2009). When 
Wolkovich, for instance, went hunting for the 
data from the 50 studies in her meta-analysis, 
only 8 data sets were available online, and many 
of the researchers whom she e-mailed refused 
to share their work. Forced to extract data from 
tables or figures in publications, Wolkovich’s 
team could conduct only limited analyses. 
Some communities have agreed to share 
online — geneticists, for example, post DNA 
sequences at the GenBank repository, and 
astronomers are accustomed to accessing 
images of galaxies and stars from, say, the Sloan 
Digital Sky Survey, a telescope that has observed 
some 500 million objects — but theseremain > 
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> the exception, not the rule. Historically, 
scientists have objected to sharing for many 
reasons: it is a lot of work; until recently, good 
databases did not exist; grant funders were not 
pushing for sharing; it has been difficult to agree 
on standards for formatting data and the con- 
textual information called metadata; and there 
is no agreed way to assign credit for data. 

But the barriers are disappearing, in part 
because journals and funding agencies world- 
wide are encouraging scientists to make their 
data public. Last year, the Royal Society in 
London said in its report Science as an Open 
Enterprise that scientists need to “shift away 
from a research culture where data is viewed as 
a private preserve’. Funding agencies note that 
data paid for with public money should be pub- 
lic information, and the scientific community is 
recognizing that data can now be shared digi- 
tally in ways that were not possible before. To 
match the growing demand, services are spring- 
ing up to make it easier to publish research 
products online and enable other researchers 
to discover and cite them. There are so many, 
in fact, that choosing where and how to publish 
data sets and other supplementary material can 
be confusing (see Abundant options’). 

“Lots of people are getting into data-hosting, 
and I think it will be tricky to decide where 
to put your data,’ says Heather Piwowar, who 
studies data-sharing for the US National Evo- 
lutionary Synthesis Center in Durham, North 
Carolina. 


SHARE AND SHARE ALIKE 

Although exhortations to share data often con- 
centrate on the moral advantages of sharing, 
the practice is not purely altruistic. Research- 
ers who share get plenty of personal benefits, 
including more connections with colleagues, 
improved visibility and increased citations. 
The most successful sharers — those whose 
data are downloaded and cited the most often 
— get noticed, and their work gets used. For 
example, one of the most popular data sets on 


multidisciplinary repository Dryad is about 
wood density around the world; it has been 
downloaded 5,700 times. Co-author Amy 
Zanne, a biologist at George Washington Uni- 
versity in Washington DC, thinks that users 
probably range from climate-change research- 
ers wanting to estimate how much carbon is 
stored in biomass, to foresters looking for infor- 
mation on different grades of timber. “I would 
much prefer to have my data used by the maxi- 
mum number of people to ask their own ques- 
tions,” she says. “It’s important to allow readers 
and reviewers to see exactly how you arrive at 
your results. Publishing data and code allows 
your science to be reproducible.” 

Even people whose data are less popular can 
benefit, adds Piwowar. By making the effort to 
organize and label files so that others can under- 
stand them, scientists become more organized 
and better disciplined themselves, and can avoid 
confusion later on. “It is often very hard to find 
and understand your own work if you are look- 
ing at it years from now,’ says Piwowar. Scien- 
tists might be inclined to stuff their data into 
folders that can get lost and muddled — but if 
they store the files in an online repository, they 
are forced to curate and collate the data, she says. 

The fear of being scooped is a powerful 
inhibitor. But scientists can put an embargo on 
their data, so that only they can see the work 
until they are ready to make it public. And data 
sets are becoming increasingly citable, bring- 
ing their authors formal recognition: data 
published in a data journal, on Dryad or on 
the repository figshare.com are given a digital 
object identifier (DOI) that can be referenced 
in other publications. (Figshare is owned by 
Digital Science, a sister company to Nature 
Publishing Group.) 

Would-be sharers often worry that their 
data are too disordered or shoddy to release 
into the world. “I make my data available, and 
it can bea pain. I’m also scared and embar- 
rassed about errors — most of us are, especially 
early-career scientists,” says Piwowar. “We 


WHERE AND HOW 


Abundant options 


Online data repositories are proliferating: 

the searchable catalogue Databib lists 

594 websites. Hundreds are specialists, 
devoted to particular kinds of data. But 
general-purpose repositories do exist: they 
include Dryad, which many scientists use to 
store the data underlying their publications; 
GitHub, which is usually used to host software 
code and to collaborate on developing it, but 
also hosts other data; European Commission 
repository ZENODO; and figshare.com, a 
general repository for posters, papers and 
data sets that welcomes negative results 

that would otherwise never be published. 


Publishers have started to launch journals 
dedicated to data sets and descriptions of 
data, such as BioMed Central’s GigaScience. 
Some scientists post data on social networks 
such as ResearchGate or Academia.edu. 
Each discipline is evolving its own 
ways to structure data and metadata. 
In biology alone, biosharing.org lists 
some 530 standards, including MIAME 
(Minimum Information About a Microarray 
Experiment) and PDB (Protein Data Bank 
format). To avoid confusion, researchers 
should familiarize themselves with the best 
practices in their fields. R.V.N. 
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don't yet have a culture of forgiveness around 
that, unlike in computer programming, where 
everyone knows there are bugs in code.” She 
advises researchers to look into repositories 
to get a sense of the 
quality standard for 
experimental data. 
“Tt doesnt have to be 
perfect,’ she says. “It’s 
probably less thor- 
ough than you think” 

As sharing grows 
more common, sci- 
entists may worry 
less about posting 
data sets. “Ultimately, 


“Lots of people data will be so ubigq- 
are getting into uitous that we will no 
data-hosting, longer be in a world 
and I think it where researchers are 
will be tricky to so scared,’ says Carl 
decide where to Boettiger, an ecolo- 


put your data.” 
Heather Piwowar 


gist at the University 
of California, Santa 
Cruz, who keeps his 
entire laboratory notebook open online (see 
Nature 493, 711; 2013). “At the end of the day, 
science is a social process. You will never get 
there hiding yourself and your work,” he adds. 


THE RIGHT PLACE 

Depositing data on a personal website is 
unlikely to be the best way to get it reused and 
cited. For a start, the website may not be around 
in five years, says William Michener, director 
of e-science initiatives at the University of New 
Mexico in Albuquerque. Michener is principal 
investigator for a multinational programme 
called DataONE, which is funded by the US 
National Science Foundation and promotes 
best practices to scientists as part of its aim to 
make data more discoverable. Journal publish- 
ers back up their research papers with the help 
of non-profit archiving services such as Portico 
and CLOCKSS, which are financed by partici- 
pating libraries and publishers, and which store 
material on a number of servers so that it will 
not disappear if a publisher goes bankrupt. 
Some data publishers have similar contingency 
plans, and Piwowar recommends looking into 
them. Ifno back-up plans are in place, she says, 
“it suggests they haven't prioritized well enough 
how to steward their data”. 

Just as important as sharing data publicly 
is making sure that other researchers can 
understand them. Susanna Assunta-Sansone, 
associate director of the Oxford e-Research 
Centre at the University of Oxford, UK, says 
that putting out data without noting what it 
means will ensure that “it’s not really reus- 
able”. To avoid this, researchers must choose 
appropriate metadata: descriptions of the 
data’s content and how they are arranged and 
set up. This type of curation is useful not just 
for human readers, but also for computer 
programmes that might be used to search 


HEATHER PIWOWAR 


through or connect data sets. Intelligent 
searches often rely on whatever descriptive 
metadata researchers have attached to the 
data. The metadata are read by an applica- 
tion programming interface (API), a set of 
commands that computer programmes use 
to interact with data stores and pull infor- 
mation from them. Notall data repositories 
use APIs; those that do not may not be the 
best places to store or release information, 
because it could be hard for anyone to find. 

Sites that are dedicated to hosting partic- 
ular types of data, such as DNA sequences, 
usually tell submitters what format is appro- 
priate. They may require data to be entered 
using an online form or following specific 
instructions. By contrast, generalist sites 
— such as institutional repositories, data 
journals or ventures similar to figshare.com 
— may have looser requirements. This has 
the potential to result in a blizzard of dif- 
ferent formats and descriptive tags, which 
could make discovering and reusing data 
more difficult, so researchers should pay 
close attention to the norms in their fields. 

Decisions about metadata standards 
should be made early in a research project, 
says Michener. DataONE has provided a 
primer on best practices, as has a tool called 
DataUp, run through the University of Cal- 
ifornia Curation Center in Oakland to help 
researchers to create data packages that are 
good enough to put online. Other aspects 
of data-sharing to consider early on include 
the information’s sensitivity and whether 
some parts must be stripped out to avoid, 
for example, identifying human study par- 
ticipants or the locations of endangered 
species. Researchers 


also need to be clear “At the end 
about whether they of the day , 
will allow their data science 1s 
setstobeusedforany social 
purpose, or whether process. You 
they would like to willnever get 
limit reuse to, for there hiding 
example, non-com- yourself and 


mercial applications. 
One widely under- 
stood way of documenting reuse rights is 
by giving the data one of several different 
Creative Commons licences. 

Ultimately, says Michener, early-career 
researchers need to pay attention to new 
and developing ways to share data, and to 
the standardized formats that are emerging 
to make data easier to search and discover. 
Those who do not, he says, should rethink 
why they are doing research. “I think we are 
just now reconnecting with what science is 
all about — not just creating new knowl- 
edge, but also sharing the information and 
data that underpins those discoveries.” m 


your work.” 


Richard Van Noorden is a senior reporter 
at Nature. 
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CAREERS 


Kevin Gurney 


Sustainability scientist Kevin Gurney has 
been studying climate change for 27 years. 

He has worked in academia, public policy, 
non-governmental organizations (NGOs) and 
think tanks, and is currently at Arizona State 
University in Tempe. He describes how he 
navigates the science-policy divide. 


What convinced you to do a graduate degree? 
As an undergraduate, I worked at the Lawrence 
Berkeley National Laboratory in California, 
taking spectroscopic measures of greenhouse 
gases. Working with wonderful mentors who 
were excited about the science was infectious. 
Later I did a master’s in atmospheric science at 
the Massachusetts Institute of Technology in 
Cambridge and my focus shifted to chemistry 
and chlorofluorocarbons (CFCs) — greenhouse 
gases that also deplete Earth's ozone layer, and so 
have science and policy implications. 


How did you become active in policy? 
Regulation was ramping up to stop production 
of fully fluorinated CFCs, and industry was 
looking for alternatives. In 1986, I found that 
compounds called HCFCs, which contained 
less chlorine and thus caused less ozone deple- 
tion, still had the heat-trapping properties of 
CFCs. The policy implications were huge and 
there was so much misinformation. I was think- 
ing, people need to know about this. I got more 
involved with policy at that point. 


Why not go on immediately to pursue a PhD? 

I wanted to work on the political implications 
first. In 1992, I started working with the Insti- 
tute for Energy and Environmental Research 
in Takoma Park, Maryland. We sued the US 
Environmental Protection Agency to get it to 
regulate HCFCs, and we spread the word that 
HCFCs were not as environmentally friendly as 
manufacturers claimed. J also got involved in 
discussions on the Montreal Protocol, the treaty 
to regulate ozone-depleting chemicals. I realized 
how ineffectively science and policy interacted. 
I got a master’s in public policy at the University 
of California, Berkeley, then a PhD in ecology 
at Colorado State University in Fort Collins. 
These days it is easier to get an interdisciplinary 
degree, but I tell my students that some degrees 
lack a rigorous science foundation. There is no 
substitute for a solid mathematics and physics 
background — it gives you credibility. 


How did you move from CFCs to carbon? 

I attended the negotiations in London and 
Copenhagen to amend the Montreal Protocol, 
laying out a plan to manage CFC phase-out. 


Once the treaty was set, I began to see that ris- 
ing carbon dioxide levels were an interesting 
problem. I maintained a personal network of 
contacts in NGOs, and many organizations 
were shifting to carbon dioxide and climate 
research for exactly the same reasons I was — it 
was quickly gaining traction. NGOs, including 
the US branch of the conservation group WWE 
in Washington DC, paid for me to go to Kyoto 
Protocol negotiations, and I worked pro bono 
as a science consultant. I told the NGOs I was 
not going to give anyone just a line they wanted 
to hear. My PhD adviser let me take vacation to 
attend negotiations every four months. 


What is climate-change negotiation like? 

It is the most intense, pressure-filled world you 
can imagine. I was very involved with language 
in the Kyoto Protocol about the missing carbon 
sink — the carbon dioxide absorbed on land, 
which is not fully understood — and how to 
account for it. I learned a lot about law during 
my policy degree, which made me effective in 
crossing the divide between policy and science. 
You dont have to dumb down; you have to learn 
how legislators and policy-makers view science. 


You won a Faculty Early Career Development 
award from the US National Science 
Foundation in 2009. How are you using it? 

I'm doinga risky thing and getting involved with 
citizen science to use Google Earth to identify 
power plants (see Nature http://doi.org/nb3; 
2013). Normally I would be too worried that it 
would fail to use funding dollars. But we have 
thousands of people involved and are add- 
ing hundreds of power plants to an emissions 
database that is part of NASA%s pilot carbon- 
monitoring system. It is of interest to climate 
scientists, social scientists and policy-makers. m 
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Ue asm SCIENCE FICTION 


ONE MEAL A DAY 


BY MILO JAMES FOWLER 


Schlange entered the living room of 
his one-bedroom apartment to find a 
well-dressed stranger sitting on the couch. 

“Who are you?” Howard dropped back a 
step, his heart lurching against his chest. 

“It is time for you to return,” said the 
stranger, remaining seated. Only his head 
had turned, the hollow eyes in his pale face 
fixed upon Howard. “You have been here 
long enough” 

Howard couldn't argue with that. Ever 
since construction on the freeway overpass 
had begun in earnest last month, the whole 
vibe of the neighbourhood had deteriorated. 
“You look like an undertaker?” 

“That does not change the situation.” 

“No, I guess not.” Keeping a wary eye on the 
intruder, Howard edged to the bar that sepa- 
rated the living room from his kitchenette. 

“Do not think to destroy me with a particle 
beam. It obviously did not work the last time.” 

“Obviously, Howard repeated, grasping 
his wallet and keys and holding them up. “I 
have an appointment.’ 

“With destiny, yes.” 

“No, with a dentist.” Howard licked his 
lips. His heart rate hadn't decelerated from 
high gear. 

“Your grasp of this primitive language 
eludes me. How do you manage to hold that 
form?” A ripple coursed through the intrud- 
er’s body as if there was a python coiled 
where his intestines should have been. “It is 
all I can do not to burst apart at the seams.” 

Howard blinked. “Listen, I don’t have any 
cash —” 

“You will not need cash — whatever it is 
— where we are going.” 

True enough; the insurance company 
would cover just about anything. “Wait. 
You're coming with me?” 

“No. You are coming with me.” 

“That’s kind of what I said” Howard 
stuffed the wallet into his back pocket and 
fiddled with the keys, jangling them against 
each other. 

“What a dreadful sound!” The stranger's 
white, four-knuckled fingers flew to plug his 
ears. “Stop it at once!” 

“Sorry. Howard dropped the keys into his 

pocket and glanced at 


3 irst thing Monday morning, Howard 
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Anunexpected guest. 


sliding door. Both were locked up, just the 
way hed left them the night before. “How did 
you get in here exactly?” 

“Much better.” Mr Four Knuckles dropped 
his hands from his ears and stood suddenly, 
like a robot straightening itself. “Let us be 
on our way.” 

“Okay?” Howard moved towards the door. 

“Where are you going?” the stranger 
snapped. 

“The dentist. I told you —” 

“There are far greater matters at stake, 
Prince Orionhart!” 

“Orion-who?” 

“Your father’s kingdom is crumbling, Your 
Highness. The Crustaceoids lie at his very 
gates!” The impeccably dressed intruder 
came three steps forward, stomping his legs 
like stilts. “I realize we have had our differ- 
ences in the past, that you never appreciated 
my meddling in your affairs, but you have to 
know that my only aim was to keep you safe 
from harm. Of course, as is the case with all 
youth, there comes a time when you desire to 
strike out on your own, leaving your doting 
caretakers behind, so I can understand why 
you attempted to kill me —” 

“Huh?” 

“Have you no memory of it at all?” 

“It never happened — not with me, any- 
way. Howard tried to swallow but found a 
dry tongue in the way. “I think you've got me 
confused with somebody else. Mr___?” He 
waited for the stranger to fill in the blank. 
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Instead, the fellow started muttering to 
himself: “I suppose it is possible that his 
brain chemistry could have undergone 
changes when he assumed the shape of one 
of these Earth creatures. I myself feel quite 
out of sorts. But it is highly unlikely that he 
would experience the sort of amnesia he 
appears to be exhibiting” 

Howard almost smiled. “Oh, I get it. You 
think ['m an alien?” This was so cool all of 
a sudden! 

The stranger scowled. “The oxygen must 
be affecting you —” 

“You think I’m an alien prince? Holy cow!” 
Howard couldn't wipe the grin from his face. 
The weird visitor's skin began to contortas if 
underneath it was a nest of spider eggs that 
were all hatching at once. “You want to take 
me to your mothership?” 

“Tam not acquainted with the female, but if 
you are referring to the Cosmic Conveyor —” 

Howard let out a whoop. “This is freakin’ 
awesome!” All thoughts of his appointment 
vanished from his mind. He stood to atten- 
tion, doing his best to contain himself. “All 
righty then. Take me to your leader” 

The stranger blinked. “You are my leader.” 

“Right. So, take me back to wherever. Out 
there.” He gestured vaguely at the ceiling. 

The stranger’s eyes squinted oddly as if 
the lower lids were attempting to devour his 
eyeballs whole. “I am beginning to have my 
doubts.” 

“Tm Prince Orionhead, ready to return to 
my homeworld. Beam me up!” 

The stranger whipped a chrome pen from 
his breast pocket and pressed the tip with one 
of his long fingers. A white light enveloped 
him. “I believe I have made a mistake ...” 

“No mistake!” Howard charged into the 
light ... And ended up sprawled out on his 
couch, very much alone. 

“Hey, what about me?” he shouted at the 
ceiling. 

In the silence that followed, he heard only 
his stomach growling. How long had it been 
since hed eaten last? After yesterday's mail 
arrived? 

He looked down at his T-shirt and smirked 
at what appeared to be a pair of boa constric- 
tors squirming inside his protruding belly. 
Shrugging, he headed out for his appointment. 

Mailman had been a fine delicacy, but he 
couldn't wait to sample dentist. m 


Milo James Fowler is an English teacher by 
day and a speculative fictioneer by night. Find 
him at www.milo-inmediasres. blogspot.co.uk. 
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